• Statistical meta modeling of computer experiments• Kriging as the main tool: two examples of computer experiments
- two levels of accuracy- with qualitative/quantitative factors
• Alternatives to kriging (to avoid numerical instability)• Some novel design issues
C. F. Jeff WuSchool of Industrial and Systems Engineering
Georgia Institute of Technology
Statistical Meta-Modeling for Complex System Simulations:
Kriging, Alternatives and Design
Why computer experiments?
Study dangerous or infeasible physical experiments, such as ammunition detonation.
1
Some examples
2
Statistical Meta-Modeling of Computer Experiments
Robustness, optimization
Surrogate model(Kriging)
Noise simulations, error propagation
More FEA runs
Computer modeling (finite-element simulation)
Physical experimentor observations
3
Kriging as an interpolator
• Kriging is an interpolation method:• Originated in geostatistics; proposed for computer experiments by Sacks, Welch et al.
.,)(ˆ pii Rxyxy ∈=
4
Kriging Modeling
• Model assumption: Gaussian process.
– mean, linear model where
– correlation function, e.g., Gaussian Correlation Function,
– variance.
5
Best Linear Unbiased Predictor
• Best Linear Unbiased Predictor (BLUP):
– is the generalized least squares estimation, =
– .– = vector of correlation between
prediction points and observed points.
– interpolating property.,)(ˆ ii yxy =
6
Kriging Modeling: EstimationKriging Modeling: Estimation
M i Lik lih d E i i• Maximum Likelihood Estimation:
))}.log(det()ˆlog({min 2
0Rn
./)ˆ()ˆ(ˆ
))}g( ()g({1'2
0
nFyRFy
where
• Numerical Stability: R is ill-conditioned, especially for large nd/ k ( i t di i ) (P d W 2010b)and/or k (=input dimension) (Peng and Wu, 2010b).
• Computational Complexity: matrix inversion O(n3); manyoptimization iterations are needed; in each iterationoptimization iterations are needed; in each iterationmatrix inversion is computed.
7
Examples of Kriging Models
• Computer experiments with two levels of accuracy
• Computer experiments with quantitative and qualitative factors
• Simulation codes with calibration parameters
8
Example: Designing Cellular Heat Exchangers for an Electronic Cooling Application
9
Heat Transfer AnalysisHE: Detailed Computer Simulation – Finite Element
Analysis (FEA) Method
• Using the computational fluid dynamic solver FLUENT
• Problem domain is divided into thousands or millions of elements.
• Each run requires hours to daysto complete.
FLUENT
10
Heat Transfer AnalysisLE: Approximate Computer Simulation —Finite
Difference Method
• The finite difference techniqueis a numerical technique forsolving 2- or 3-D steady state heat transfer problems.
• Temperature distribution approximated via numerical solution of 3D heat transfer equations using forward or central difference methods.
• Each run takes minutes to complete.
• Less accurate than FEA.
Finite Difference
11
High-accuracy and Low-accuracy Experiments
• Paradigm shift: single experiment→ multiple experiments withdifferent
levels of accuracy.
• A generic pair:high-accuracy experiment(HE) andlow-accuracyexperiment (LE).
• HE is moreaccuratebut moreexpensivethan LE.
• Examples:
– physical experiment vs. computer simulation
– detailed computer simulation vs. approximate computer simulation
• 1. Modeling:
How to model and analyze data from HE and LE?
2. Experimental design:
How to plan HE and LE?
12
Frequentist Approach in Qian et al. (2006, ASME)Related to the autoregressive model in Kennedy and O’Hagan (2000)
• x = (x1, . . . ,xk): design variables.Dl andDh: sets of design points (xi) for LE and HE withDh ⊂ Dl .yh andyl : outputs from HE and LE.
• Base Surrogate Model:yl = βl0 +∑ j βl j xi + εl (xi), xi ∈ Dl ,εl ∼ GP(0,σ2
l ,φl ).
• Adjustment Model: yh(xi) = ρ(xi)yl (xi)+δ(xi), xi ∈ Dh,
– scale adjustment:ρ(xi) = ρ0 +∑ j ρ jxi j ;
– location adjustment: δ ∼ GP(δ0,σ2δ,φδ).
• Fitting:
Base surrogate model:yl
Adjustment model:ρ andδ
=⇒ Final surrogate model:yh = ρyl + δ.
• Desirable property:yh interpolatesyh(xi),xi ∈ Dh if HE is deterministic.
13
Bayesian Approach in Qian and Wu (2008,
Technometrics)
• UseflexibleBayesian hierarchicalGaussianprocess model (BHGP).
• Provide moreflexible adjustment. Can accommodateparameteruncertainty and measurement error of HE.
• BHGP:
– Base Surrogate Model: yl = βl0 +∑ j βl j xi + εl (xi), xi ∈ Dl ,
εl ∼ GP(0,σ2l ,φl ).
– Flexible Adjustment Model:
yh(xi) = ρ(xi)yl (xi)+δ(xi)+ ε(xi), xi ∈ Dh,
scale adjustmentρ ∼ GP(ρ0,σ2ρ,φρ),
location adjustmentδ ∼ GP(δ0,σ2δ,φδ),
ε(x) ∼ N(0,σ2ε). ε = 0⇔ no measurement error.
14
Model Parameters and Priors for the BHGP Model
• Model parameters
– Mean parametersθ1 = (βl ,ρ0,δ0).
– Variance parametersθ2 = (σ2l ,σ
2ρ,σ2
δ,σ2ε).
– Correlationparametersθ3 = (φl ,φρ,φδ);
complexityincreaseswith the no. of input factors (∵dim = 3k).
• Priors
– Normal forθ1.
– Inverse-gamma forθ2.
– Gamma forθ3.
15
Computational Algorithm
1: Fitting Correlation Parameters θ3
Solve astochasticprogram
maxφρ,φδ
L2 = Ep(τ1,τ2) f (τ1,τ2)
by the Sample Average Approximation (SAA) algorithm.
2: Markov Chain Monte Carlo (MCMC) Sampling from p(θ1,θ2|yl ,yh,θ3)
• Rationale: Conditional distributions forθ1,θ2 are in closed form, but not for
θ3 givenθ1, θ2.
• It is not afully Bayesian procedure (to save computation forθ3).
16
Posterior density ofρ0, δ0, σ2ρ, σ2
δ
0.6 0.8 1.0 1.2 1.4
01
23
4
(a)
ρ0
De
nsi
ty
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
2.0
2.5
(b)
σρ2
De
nsi
ty
−1 0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
(c)
δ0
De
nsi
ty
0 1 2 3 4 5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(d)
σδ2
De
nsi
ty
17
Location and Scale Change
• Symmetric location changeδ0 with center at 0.85.
• Scaleadjustmentρ0 has multi modes, which may be explained bymultiplelaws.
18
Calibration of Computer ModelsCalibration of Computer Models
C lib ti f fitti t d l t b d• Calibration = process of fitting a computer model to observed data by adjusting input parameters of the model (Kennedy and O’Hagan, 2001).
• Computer model: ,• Physical model:
= calibration parameters = model inadequacy = adjustment
),( xyc
,)(),( xxyp )(x – = calibration parameters, = model inadequacy, = adjustment term.
• Example 1-Nuclear Accident: Use data on reactor fire to calibrate a plume model of radioactive deposition Source term
)(x
calibrate a plume model of radioactive deposition. Source term and deposition velocity = calibration parameters.
• Example 2-Methane Combustion: a computer model of ignition d l f h b i C lib idelay of methane combustion. Calibration parameters = seven reaction rates.
19
GP with quanti/quali factors: Data Center Thermal Distribution
20
Configuration Variables for the Data Center Example
• Five quanti factors: rack temperature rise, rack power, diffuser angle, diffuser
flow rate, ceilingheight.
• Threequali factors: diffuser location, hot-air return-vent location, power
allocation.
21
GP Models with Quantitative and Qualitative Factors
(Qian, H. Wu, C. F. J. Wu 2008,Technometrics)
• Input factors:w = (x,z); I quantitative factors:x = (x1, . . . ,xI ); J qualitative
factors:z = (z1, . . . ,zJ); response value:y(w).
• Build asingleGP model for bothx andz. Borrow strengths from all the
observations:
y(w) = ∑m
βm fm(w)+ ε(w).
• How to specify fm? Use regression modeling involvingx andz.
• How to specifyε (especially its correlation structure)?
• Related work: Bayesian hierarchical GP models (Han et al., 2009).
• Corresponding designs:Sliced space-filling designs(Qian and Wu, 2009,
Biometrika).
22
Construction of Correlation Functions for ε(w)
• Consider one quali factorz1 with m1 levels 1, . . . ,m1.
Foru = 1, . . . ,m1, εu(x) = ε(x,u).
• Idea: envision a mean-zero multivariate process(ε1(x), . . . ,εm1(x)′ = Aη(x).
• A: anm1×m1 matrix withunit row vectors.
• Elements ofη(x): m1 independent processes with a common varianceσ2.
• For two input valuesw1 = (x1,z11) andw2 = (x2,z12),
cor(ε(w1),ε(w2)) = τz11,z12Kφ(x1,x2)
– Kφ(x1,x2): correlation betweenx1 andx2.
– T1 = AA t : anm1×m1 correlationmatrix for z1 (i.e.,positive definitematrix with unit diagonal elements).
23
Alternatives to Kriging
1. Tweaking of kriging:– covariance matrix tapering (Kaufman et al., 2008),– rank reduction (Cressie-Johannesson, 2008),– regularized krigng (Peng-Wu, 2010a)
2. Adding penalty to likelihood (Li-Sudjianto, 2005): another approach to regularization.
24
Alternatives to Kriging (cont )Alternatives to Kriging (cont.)
3. Completely different ideas:
- Radial basis interpolation functions (Buhmann 2004)- Radial basis interpolation functions (Buhmann, 2004)
- Regression-based inverse distance weighting (Joseph and Kang, 2009): a fast interpolator
O l b i h d (OBSM) (Ch W d W- Overcomplete basis surrogate method (OBSM) (Chen, Wang and Wu, 2010): an approximator, not interpolator
25
Inverse Distance Weighting (IDW)
• Inverse Distance Weighting (Shepard, 1968):
–
–
– Simple computation but poor prediction.
26
Regression-Based Inverse Distance Weighting (RIDW)
• Add regression part to IDW (Joseph and Kang, 2009):
– = mean part; can be linear, nonlinear, nonparametric.
–
– (faster convergence than IDW)
–
27
Comparisons Between RIDW and Kriging
28
Design of Experiments with HE and LE
• Key to efficiently allocate resources and acquire information from HE and LE.
• x = (x1, . . . ,xk): design variables in[0,1]k.
Dl : set of design points (xi) for LE. Dh: set of design points (xi) for HE.
• Three principles for constructingDl andDh:
Economy: The size ofDh is less than the size ofDl .
Nested relationship: Dh is a subset ofDl .
Space-filling: Points inDh andDl achieve uniformity in low dimensions.
• How to constructmultiple experimentswith respect tomultiplerequirements?
• New issue in design of experiments: traditional methods deal almost
exclusively with experiments with one level of accuracy.
29
Nested Space-Filling Designs
Proposed in Qian, Tang and Wu (2009),StatisticaSinica, and furtherdeveloped in
Qian, Ai and Wu (2009),Annals of Statistics, Qian (2009),Biometrika.
Ideas:
1. Use aspecialorthogonal array to construct an OA-based Latin hypercube
design forDl .
2. ChooseDh to be acarefullyselected subset ofDl with two-dimensional
balance.
Need to use simple Galois field theory.
30
Nested Orthogonal Array
Let A be anOA(N1,k,s1, t). SupposeA has asubarray withN2 rows, denoted by
A1, and there is a projectionφ that collapse thes1 levels ofA into s2 levels. Further
supposeA1 is anOA(N2,k,s2, t) if the s1 levels of its entries are collapsed tos2
levels according toφ. Then an array with this structure is called anested
orthogonal array.
31
Construction of Nested Space-Filling Design
• The nested orthogonal arrayA2 ⊂ A1 does not automatically generatenested
space-fillingdesigns if thes1 levels are arbitrarily labeled.
• In constructing OA-based Latin hypercubeD1 usingA1, thes1 levels ofA1,
represented by the polynomials of degreeu1−1 or lower, have to be first
labeled as 1, . . . ,s1.
• The subsetD2 of D1 corresponding toA2 may not have good space-filling
properties.
• Care should be taken inlabelingthe levels to ensure thatD2 achieves
stratificationin two dimensions.
• Thes1 levels ofA1 must be labeled in such a way that the group of levels that
are mapped to the same level should form a consecutive subset of 1, . . . ,s1.
32
An Example of Nested Space-Filling Design
• Five factorsx = (x1,x2,x3,x4,x5) taking values inthe unit hypercube[0,1]5.
• Use the orthogonal array in Table 1 to construct an OA-based Latin hypercube
D1 for x1-x5.
• Choose a subarrayD2 of D1 consisting of the 16 points inD1 corresponding to
runs 1-4, 9-12, 17-20, 25-28 of the array in Table 1.
33
The Underlying Nested Orthogonal Array
Run # x1 x2 x3 x4 x5 Run # x1 x2 x3 x4 x5
1 1 1 1 1 1 33 8 1 8 8 8
2 1 3 3 5 7 34 8 3 6 4 2
3 1 5 5 8 4 35 8 5 4 1 5
4 1 7 7 4 6 36 8 7 2 5 3
5 1 8 8 7 2 37 8 8 1 2 7
6 1 6 6 3 8 38 8 6 3 6 1
7 1 4 4 2 3 39 8 4 5 7 6
8 1 2 2 6 5 40 8 2 7 3 4
9 3 1 3 3 3 41 6 1 6 6 6
10 3 3 1 7 5 42 6 3 8 2 4
11 3 5 7 6 2 43 6 5 2 3 7
12 3 7 5 2 8 44 6 7 4 7 1
13 3 8 6 5 4 45 6 8 3 4 5
14 3 6 8 1 6 46 6 6 1 8 3
15 3 4 2 4 1 47 6 4 7 5 8
16 3 2 4 8 7 48 6 2 5 1 2
34
Run # x1 x2 x3 x4 x5 Run # x1 x2 x3 x4 x5
17 5 1 5 5 5 49 4 1 4 4 4
18 5 3 7 1 3 50 4 3 2 8 6
19 5 5 1 4 8 51 4 5 8 5 1
20 5 7 3 8 2 52 4 7 6 1 7
21 5 8 4 3 6 53 4 8 5 6 3
22 5 6 2 7 4 54 4 6 7 2 5
23 5 4 8 6 7 55 4 4 1 3 2
24 5 2 6 2 1 56 4 2 3 7 8
25 7 1 7 7 7 57 2 1 2 2 2
26 7 3 5 3 1 58 2 3 4 6 8
27 7 5 3 2 6 59 2 5 6 7 3
28 7 7 1 6 4 60 2 7 8 3 5
29 7 8 2 1 8 61 2 8 7 8 1
30 7 6 4 5 2 62 2 6 5 4 7
31 7 4 6 8 5 63 2 4 3 1 4
32 7 2 8 4 3 64 2 2 1 5 6
Table 1. An OA(64,5,8,2) that contains an OA(16,5,4,2), i.e., the submatrix consisting of runs1-4, 9-12, 17-20, 25-28 becomes an OA(16,5,4,2) after some suitable level collapsing
35
2-D Projection of D1
x1
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
x2
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x3
36
2-D Projection of D2
x1
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
x2
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x3
37
iConclusions
• Computer simulations have become popular in studying complex systems.
• Statistics-based meta (surrogate, approximate) models are useful companions to simulations.
• A rich class of problems: modeling, estimation, prediction, calibration, computations, design.
• Kriging is popular for building meta models but can have problems with numerical instability.
• Effective alternatives to kriging have emerged in recent work. Cross-disciplinary in nature.
38
Step 1: Fitting Correlation Parametersθ3
• Obtainposterior modeθ3 = (φl , φρ, φδ) by solving
(P) : maxφl ,φρ,φδ
p(θ3|yh,yl ).
• (P) is equivalent to two separableproblems:
– (P1) : a non-linear program forφl ,
– (P2) : a problem forφρ andφδ. Its objective function involves an integral.
•
(P2) ⇔ astochastic program (P′2) : max
φρ,φδL2 = Ep(τ1,τ2) f (τ1,τ2).
• SolveP′2 by theSample Average Approximation (SAA) algorithm :
1. Generate random samples(τs1,τ
s2) from p(τ1,τ2),s= 1,· · · ,S.
2. Solve the approximate problem:
(φρ, φδ) = argmaxφρ,φδ
[L2 =1S
S
∑s=1
f (τs1,τ
s2)].
39
Step 2: Markov Chain Monte Carlo (MCMC)
Sampling from p(θ1,θ2|yl ,yh,θ3)
• Somefull conditional distributions for θ1,θ2 are not regular:
– p(βl |yl ,yh,θ3,βl ) ∼ Normal,
– p(ρ0|yl ,yh,θ3,ρ0) ∼ Normal,
– p(δ0|yl ,yh,θ3,δ0) ∼ Normal,
– p(σ2l |yl ,yh,θ3,σ2
l ) ∼ IG,
– p(σ2ρ|yl ,yh,θ3,σ2
ρ) ∼ IG,
– p(τ1,τ2|yl ,yh,τ1,τ2) ∝ 1
ταδ+ 3
21
1τ
αε+12
exp{−1τ1
(γδσ2
ρ+
(δ0−uδ)2
2vδσ2ρ
)− γετ2σ2
ρ} 1
|M|12
·exp{−(yh−ρ0yl1 −δ01n1)
tM−1(yh−ρ0yl1 −δ01n1)
2σ2ρ
}
– τ1 = σ2δ/σ2
ρ,τ2 = σ2δ/σ2
ε
• Use theMetropolis-within-Gibbs algorithm.
40
The General Case
• ConsiderJ qualitative factorsz1, . . . ,zJ with zj havingmj levels 1, . . . ,mj .
• For two input valuesw1 = (x1,z11) andw2 = (x2,z12),
cor(ε(w1),ε(w2)) =
[J
∏j=1
τ j,zj1,zj2
]exp
{−
I
∑i=1
φi(xi1 −xi2)2
}.
• T j : anmj ×mj correlationmatrix for zj (i.e., apositivedefinite matrix with
unit diagonal elements).
• This correlation function has a product form.
• Assume the elements ofT j ≥ 0 for deterministic computer experiments.
41
Some Restrictive Forms of Tj
Consider two input valuesw1 = (x1,z1) andw2 = (x2,z2) with responsesy(w1)
andy(w2). Recallthatτ j,zj1,zj2 is the correlation betweenzj1 andzj2.
1. Isotropy correlation function:τ j,zj1,zj2 = exp{−θ j I [zj1 6= zj2]}.
Forw1 andw2,
cor(ε(w1),ε(w2)) = exp
{−
I
∑i=1
φi(xi1 −xi2)2−
J
∑j=1
θ j I [zj1 6= zj2]
}
Euclideandistance forxi ; 0-1 distance forzj .
2. Multiplicative correlation function:
τr,s = exp{−θr,s} = exp{−(θr +θs)I [r 6= s]}. (1)
3. Group correlation functions.
4. Correlation functions for ordinal qualitative factors.
42
Estimation
• Modelparameters:
– mean parametersβ = (β1, . . . ,βp).
– variance parameterσ2.
– correlation parametersphi = (φ1, . . . ,φI )t , andT = {T1, . . . ,TJ}.
• The estimation iterates between
Regression fitting: Given ˆphi andT, estimateβ andσ2.
Simple!
Correlation fitting: Givenβ andσ2, let U = (u1, . . . ,un) with
ui = [yi − βtf(wi)]/σ and then fit a GP with mean zero and variance one to
the transformed dataU.
43