visualising the input space of a galaxy formation simulation · 2012-04-16 · uq12 minitutorial -...
TRANSCRIPT
Visualising the Input Space of aGalaxy Formation Simulation
UQ12 Minitutorial
Presented by: Tony O’Hagan,
Peter Challenor, Ian Vernon
Overview
UQ12 minitutorial - session 6 2 / 68
• In many scientific disciplines complex computer simulators are employed tohelp understand corresponding real world physical processes (e.g climatemodels are used to analyse climate).
Overview
UQ12 minitutorial - session 6 2 / 68
• In many scientific disciplines complex computer simulators are employed tohelp understand corresponding real world physical processes (e.g climatemodels are used to analyse climate).
• These simulators, referred to as Computer Models, share manyattributes and also many problems.
Overview
UQ12 minitutorial - session 6 2 / 68
• In many scientific disciplines complex computer simulators are employed tohelp understand corresponding real world physical processes (e.g climatemodels are used to analyse climate).
• These simulators, referred to as Computer Models, share manyattributes and also many problems.
• Often they take a long time to run, and require the specification of a largenumber of input parameters x.
Overview
UQ12 minitutorial - session 6 2 / 68
• In many scientific disciplines complex computer simulators are employed tohelp understand corresponding real world physical processes (e.g climatemodels are used to analyse climate).
• These simulators, referred to as Computer Models, share manyattributes and also many problems.
• Often they take a long time to run, and require the specification of a largenumber of input parameters x.
• An area of Statistics has arisen to deal with such models, and is mainlycentred around the construction of ‘Emulators’: fast stochasticapproximations to the Computer Model.
Overview
UQ12 minitutorial - session 6 2 / 68
• In many scientific disciplines complex computer simulators are employed tohelp understand corresponding real world physical processes (e.g climatemodels are used to analyse climate).
• These simulators, referred to as Computer Models, share manyattributes and also many problems.
• Often they take a long time to run, and require the specification of a largenumber of input parameters x.
• An area of Statistics has arisen to deal with such models, and is mainlycentred around the construction of ‘Emulators’: fast stochasticapproximations to the Computer Model.
• Often the most pressing question is: are there any inputs that giveacceptable matches between the model output and observed data?
Overview
UQ12 minitutorial - session 6 3 / 68
• Going to History Match a Galaxy formation simulation known asGalform.
Overview
UQ12 minitutorial - session 6 3 / 68
• Going to History Match a Galaxy formation simulation known asGalform.
• This involves learning about acceptable inputs x to the Galform model,using observed data z.
Overview
UQ12 minitutorial - session 6 3 / 68
• Going to History Match a Galaxy formation simulation known asGalform.
• This involves learning about acceptable inputs x to the Galform model,using observed data z.
• We use emulators and implausibility measures to cut out input spaceiteratively.
Overview
UQ12 minitutorial - session 6 3 / 68
• Going to History Match a Galaxy formation simulation known asGalform.
• This involves learning about acceptable inputs x to the Galform model,using observed data z.
• We use emulators and implausibility measures to cut out input spaceiteratively.
• We will discuss relevant uncertainties: model discrepancy, observationalerrors, function uncertainty etc.
Overview
UQ12 minitutorial - session 6 3 / 68
• Going to History Match a Galaxy formation simulation known asGalform.
• This involves learning about acceptable inputs x to the Galform model,using observed data z.
• We use emulators and implausibility measures to cut out input spaceiteratively.
• We will discuss relevant uncertainties: model discrepancy, observationalerrors, function uncertainty etc.
• Finally, we will consider various visualisation issues: even when we canidentify the acceptable input space, visualisation is difficult.
Overview
UQ12 minitutorial - session 6 3 / 68
• Going to History Match a Galaxy formation simulation known asGalform.
• This involves learning about acceptable inputs x to the Galform model,using observed data z.
• We use emulators and implausibility measures to cut out input spaceiteratively.
• We will discuss relevant uncertainties: model discrepancy, observationalerrors, function uncertainty etc.
• Finally, we will consider various visualisation issues: even when we canidentify the acceptable input space, visualisation is difficult.
• The approach described is completely general, and can be used for anymodel that is relatively slow to run and requires lots of inputs.
Why History Match?
UQ12 minitutorial - session 6 4 / 68
• History Matching is an efficient technique that seeks to identify the set Xof all acceptable inputs x.
Why History Match?
UQ12 minitutorial - session 6 4 / 68
• History Matching is an efficient technique that seeks to identify the set Xof all acceptable inputs x.
• Often X only occupies a tiny fraction of the original input space.
Why History Match?
UQ12 minitutorial - session 6 4 / 68
• History Matching is an efficient technique that seeks to identify the set Xof all acceptable inputs x.
• Often X only occupies a tiny fraction of the original input space.
• This set X may be empty: we do not presuppose that any such inputsexist.
Why History Match?
UQ12 minitutorial - session 6 4 / 68
• History Matching is an efficient technique that seeks to identify the set Xof all acceptable inputs x.
• Often X only occupies a tiny fraction of the original input space.
• This set X may be empty: we do not presuppose that any such inputsexist.
• This is the main difference between History Matching and the relatedtechnique of Probabilistic Calibration.
Why History Match?
UQ12 minitutorial - session 6 4 / 68
• History Matching is an efficient technique that seeks to identify the set Xof all acceptable inputs x.
• Often X only occupies a tiny fraction of the original input space.
• This set X may be empty: we do not presuppose that any such inputsexist.
• This is the main difference between History Matching and the relatedtechnique of Probabilistic Calibration.
• The later is a useful technique, but assumes a single ‘best input’ and givesits posterior distribution.
Andromeda Galaxy and Hubble Deep Field View
UQ12 minitutorial - session 6 5 / 68
• Andromeda Galaxy: closest large galaxy to our own milky way, contains1 trillion stars.
• Hubble Deep Field: one of the furthest images yet taken. Covers 2millionths of the sky but contains over 3000 galaxies.
The Galform Model
UQ12 minitutorial - session 6 6 / 68
• The Cosmologists at the ICC are interested in modelling galaxy formationin the presence of Dark Matter.
The Galform Model
UQ12 minitutorial - session 6 6 / 68
• The Cosmologists at the ICC are interested in modelling galaxy formationin the presence of Dark Matter.
• First a Dark Matter simulation is performed over a volume of (1.63 billionlight years)3. This takes 3 months on a supercomputer.
The Galform Model
UQ12 minitutorial - session 6 6 / 68
• The Cosmologists at the ICC are interested in modelling galaxy formationin the presence of Dark Matter.
• First a Dark Matter simulation is performed over a volume of (1.63 billionlight years)3. This takes 3 months on a supercomputer.
• Galform takes the results of this simulation, includes the more realisticphysics of ‘normal matter’, and models the evolution and attributes ofapproximately 1 million galaxies.
The Galform Model
UQ12 minitutorial - session 6 6 / 68
• The Cosmologists at the ICC are interested in modelling galaxy formationin the presence of Dark Matter.
• First a Dark Matter simulation is performed over a volume of (1.63 billionlight years)3. This takes 3 months on a supercomputer.
• Galform takes the results of this simulation, includes the more realisticphysics of ‘normal matter’, and models the evolution and attributes ofapproximately 1 million galaxies.
• Galform requires the specification of 17 unknown inputs in order to run.
The Galform Model
UQ12 minitutorial - session 6 6 / 68
• The Cosmologists at the ICC are interested in modelling galaxy formationin the presence of Dark Matter.
• First a Dark Matter simulation is performed over a volume of (1.63 billionlight years)3. This takes 3 months on a supercomputer.
• Galform takes the results of this simulation, includes the more realisticphysics of ‘normal matter’, and models the evolution and attributes ofapproximately 1 million galaxies.
• Galform requires the specification of 17 unknown inputs in order to run.
• It takes approximately 1 day to complete 1 run (using a single processor).
The Galform Model
UQ12 minitutorial - session 6 6 / 68
• The Cosmologists at the ICC are interested in modelling galaxy formationin the presence of Dark Matter.
• First a Dark Matter simulation is performed over a volume of (1.63 billionlight years)3. This takes 3 months on a supercomputer.
• Galform takes the results of this simulation, includes the more realisticphysics of ‘normal matter’, and models the evolution and attributes ofapproximately 1 million galaxies.
• Galform requires the specification of 17 unknown inputs in order to run.
• It takes approximately 1 day to complete 1 run (using a single processor).
• The Galform model produces lots of outputs, some of which can becompared to observed data from the real Universe.
Galform: Which Inputs to Use?
UQ12 minitutorial - session 6 7 / 68
• PROBLEM: We want to identify the set of all inputs X that lead toacceptable matches between model outputs and observed data,given all relevant uncertainties.
Galform: Which Inputs to Use?
UQ12 minitutorial - session 6 7 / 68
• PROBLEM: We want to identify the set of all inputs X that lead toacceptable matches between model outputs and observed data,given all relevant uncertainties.
• 17-dimensional input space is large! If we did the simplest grid basedsearch (setting each input to max or min), we would require 217 runs.
Galform: Which Inputs to Use?
UQ12 minitutorial - session 6 7 / 68
• PROBLEM: We want to identify the set of all inputs X that lead toacceptable matches between model outputs and observed data,given all relevant uncertainties.
• 17-dimensional input space is large! If we did the simplest grid basedsearch (setting each input to max or min), we would require 217 runs.
• This would take approximately 360 years to complete (on one processor)!
Galform: Which Inputs to Use?
UQ12 minitutorial - session 6 7 / 68
• PROBLEM: We want to identify the set of all inputs X that lead toacceptable matches between model outputs and observed data,given all relevant uncertainties.
• 17-dimensional input space is large! If we did the simplest grid basedsearch (setting each input to max or min), we would require 217 runs.
• This would take approximately 360 years to complete (on one processor)!
• We would really want a higher definition, so would want say 1017 runs...
Galform: Which Inputs to Use?
UQ12 minitutorial - session 6 7 / 68
• PROBLEM: We want to identify the set of all inputs X that lead toacceptable matches between model outputs and observed data,given all relevant uncertainties.
• 17-dimensional input space is large! If we did the simplest grid basedsearch (setting each input to max or min), we would require 217 runs.
• This would take approximately 360 years to complete (on one processor)!
• We would really want a higher definition, so would want say 1017 runs...This would take far longer than the current age of the Universe.
Galform: Which Inputs to Use?
UQ12 minitutorial - session 6 7 / 68
• PROBLEM: We want to identify the set of all inputs X that lead toacceptable matches between model outputs and observed data,given all relevant uncertainties.
• 17-dimensional input space is large! If we did the simplest grid basedsearch (setting each input to max or min), we would require 217 runs.
• This would take approximately 360 years to complete (on one processor)!
• We would really want a higher definition, so would want say 1017 runs...This would take far longer than the current age of the Universe.
• SOLUTION: Construct an Emulator, which is a stochastic function thatapproximates the Galform model, and is fast to evaluate.
Galform: Which Inputs to Use?
UQ12 minitutorial - session 6 7 / 68
• PROBLEM: We want to identify the set of all inputs X that lead toacceptable matches between model outputs and observed data,given all relevant uncertainties.
• 17-dimensional input space is large! If we did the simplest grid basedsearch (setting each input to max or min), we would require 217 runs.
• This would take approximately 360 years to complete (on one processor)!
• We would really want a higher definition, so would want say 1017 runs...This would take far longer than the current age of the Universe.
• SOLUTION: Construct an Emulator, which is a stochastic function thatapproximates the Galform model, and is fast to evaluate.
• Use the Emulator to find the acceptable inputs.
The Dark Matter Simulation
UQ12 minitutorial - session 6 8 / 68
The Galform Model
UQ12 minitutorial - session 6 9 / 68
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 10 / 68
• Galform provides multiple output data sets.
• Initially we analyse the luminosity functions which give the number ofgalaxies per unit volume, for each luminosity.
• Bj Luminosity: corresponds to density of young (blue) galaxies• K Luminosity: corresponds to density of old (red) galaxies
Input Parameters
UQ12 minitutorial - session 6 11 / 68
• To perform one run, we need to specify numbers for each of the following17 inputs:
vhotdisk: 100 - 550 VCUT: 20 - 50aReheat: 0.2 - 1.2 ZCUT: 6 - 9alphacool: 0.2 - 1.2 alphastar: -3.2 - -0.3vhotburst: 100 - 550 tau0mrg: 0.8 - 2.7epsilonStar: 0.001 - 0.1 fellip: 0.1 - 0.35stabledisk: 0.65 - 0.95 fburst: 0.01 - 0.15alphahot: 2 - 3.7 FSMBH: 0.001 - 0.01yield: 0.02 - 0.05 eSMBH: 0.004 - 0.05tdisk: 0 - 1
• What input values should we choose to get ‘acceptable’ outputs?
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 12 / 68
• Basic problem is that we pick inputs:• vhotdisk = 290.5, aReheat = 1.15, alphacool = 0.31, ...
• And find that after 1 Day of Runtime:
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 12 / 68
• Basic problem is that we pick inputs:• vhotdisk = 290.5, aReheat = 1.15, alphacool = 0.31, ...
• And find that after 1 Day of Runtime:• 1st run is rubbish.
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 13 / 68
• Basic problem is that we pick inputs:• vhotdisk = 223.3, aReheat = 0.49, alphacool = 1.12, ...
• And find that after 2 Days of Runtime:
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 14 / 68
• Basic problem is that we pick inputs:• vhotdisk = 223.3, aReheat = 0.49, alphacool = 1.12, ...
• And find that after 2 Days of Runtime:• 2nd run is rubbish.
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 15 / 68
• Basic problem is that we pick inputs:• vhotdisk = 349.7, aReheat = 0.21, alphacool = 1.08, ...
• And find that after 3 Days of Runtime:
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 16 / 68
• Basic problem is that we pick inputs:• vhotdisk = 349.7, aReheat = 0.21, alphacool = 1.08, ...
• And find that after 3 Days of Runtime:• 3rd run is rubbish.
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 17 / 68
• Pick 20 inputs and find after 20 Days of Runtime:• All runs are rubbish.
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 18 / 68
• Pick 40 inputs and find after 40 Days of Runtime:• All runs are rubbish.
Galform Outputs: The Luminosity Functions
UQ12 minitutorial - session 6 19 / 68
• Pick 60 inputs and find after 60 Days of Runtime:• All runs are rubbish.
11 Outputs Chosen
UQ12 minitutorial - session 6 20 / 68
• We do 1000 runs using carefully chosen inputs (a space-filling maximinlatin hypercube design).
• (Again all runs are found to be unacceptable.)
11 Outputs Chosen
UQ12 minitutorial - session 6 20 / 68
• We do 1000 runs using carefully chosen inputs (a space-filling maximinlatin hypercube design).
• (Again all runs are found to be unacceptable.)• We choose 11 outputs that are representative of the Luminosity functions
and emulate the functions fi(x).
Linking Model to Reality
UQ12 minitutorial - session 6 21 / 68
• We represent the model (Galform) as a function, which maps the vector of17 inputs x to the vector of 11 outputs f(x).
Linking Model to Reality
UQ12 minitutorial - session 6 21 / 68
• We represent the model (Galform) as a function, which maps the vector of17 inputs x to the vector of 11 outputs f(x).
• We use the “Best Input Approach” to link the model f(x) to the realsystem y (i.e. the real Universe) via:
y = f(x+) + d
where we define d to be the model discrepancy and assume that d isindependent of f and x+.
Linking Model to Reality
UQ12 minitutorial - session 6 21 / 68
• We represent the model (Galform) as a function, which maps the vector of17 inputs x to the vector of 11 outputs f(x).
• We use the “Best Input Approach” to link the model f(x) to the realsystem y (i.e. the real Universe) via:
y = f(x+) + d
where we define d to be the model discrepancy and assume that d isindependent of f and x+.
• Finally, we relate the true system y to the observational data z by,
z = y + e
where e represent the observational errors.
Linking Model to Reality
UQ12 minitutorial - session 6 21 / 68
• We represent the model (Galform) as a function, which maps the vector of17 inputs x to the vector of 11 outputs f(x).
• We use the “Best Input Approach” to link the model f(x) to the realsystem y (i.e. the real Universe) via:
y = f(x+) + d
where we define d to be the model discrepancy and assume that d isindependent of f and x+.
• Finally, we relate the true system y to the observational data z by,
z = y + e
where e represent the observational errors.
• We will use the Bayes Linear methodology, which only involvesexpectations, variances and covariances.
Galform: Emulation
UQ12 minitutorial - session 6 22 / 68
• For each of the 11 outputs we pick active variables xA then emulateunivariately (at first) using:
fi(x) =∑
j
βij gij(xA) + ui(x
A) + δi(x)
Galform: Emulation
UQ12 minitutorial - session 6 22 / 68
• For each of the 11 outputs we pick active variables xA then emulateunivariately (at first) using:
fi(x) =∑
j
βij gij(xA) + ui(x
A) + δi(x)
• The∑
j βij gij(xA) is a 3rd order polynomial in the active inputs.
Galform: Emulation
UQ12 minitutorial - session 6 22 / 68
• For each of the 11 outputs we pick active variables xA then emulateunivariately (at first) using:
fi(x) =∑
j
βij gij(xA) + ui(x
A) + δi(x)
• The∑
j βij gij(xA) is a 3rd order polynomial in the active inputs.
• ui(xA) is a Gaussian process.
Galform: Emulation
UQ12 minitutorial - session 6 22 / 68
• For each of the 11 outputs we pick active variables xA then emulateunivariately (at first) using:
fi(x) =∑
j
βij gij(xA) + ui(x
A) + δi(x)
• The∑
j βij gij(xA) is a 3rd order polynomial in the active inputs.
• ui(xA) is a Gaussian process.
• The nugget δi(x) models the effects of inactive variables as random noise.
Galform: Emulation
UQ12 minitutorial - session 6 22 / 68
• For each of the 11 outputs we pick active variables xA then emulateunivariately (at first) using:
fi(x) =∑
j
βij gij(xA) + ui(x
A) + δi(x)
• The∑
j βij gij(xA) is a 3rd order polynomial in the active inputs.
• ui(xA) is a Gaussian process.
• The nugget δi(x) models the effects of inactive variables as random noise.
• The ui(xA) have covariance structure given by:
Cov(ui(xA1 ), ui(x
A2 )) = σ2
i exp[−|xA1 − xA
2 |2/θ2
i ]
Galform: Emulation
UQ12 minitutorial - session 6 22 / 68
• For each of the 11 outputs we pick active variables xA then emulateunivariately (at first) using:
fi(x) =∑
j
βij gij(xA) + ui(x
A) + δi(x)
• The∑
j βij gij(xA) is a 3rd order polynomial in the active inputs.
• ui(xA) is a Gaussian process.
• The nugget δi(x) models the effects of inactive variables as random noise.
• The ui(xA) have covariance structure given by:
Cov(ui(xA1 ), ui(x
A2 )) = σ2
i exp[−|xA1 − xA
2 |2/θ2
i ]
• The Emulators give the expectation E[fi(x)] and variance Var[fi(x)] atpoint x for each output given by i = 1, .., 11, and are fast to evaluate.
Emulation: a 1D Example
UQ12 minitutorial - session 6 23 / 68
Emulation: a 1D Example
UQ12 minitutorial - session 6 24 / 68
Emulation: a 1D Example
UQ12 minitutorial - session 6 25 / 68
Emulation: a 1D Example
UQ12 minitutorial - session 6 26 / 68
Emulation: a 1D Example
UQ12 minitutorial - session 6 27 / 68
Emulation: a 1D Example
UQ12 minitutorial - session 6 28 / 68
Emulation: a 1D Example
UQ12 minitutorial - session 6 29 / 68
Emulation: a 1D Example
UQ12 minitutorial - session 6 30 / 68
Model Discrepancy
UQ12 minitutorial - session 6 31 / 68
Before calculating the implausibility we need to assess the Model Discrepancyand Measurement error:
Model Discrepancy Var[d] = Φ40 + Φ9 + ΦE
Model Discrepancy
UQ12 minitutorial - session 6 31 / 68
Before calculating the implausibility we need to assess the Model Discrepancyand Measurement error:
Model Discrepancy Var[d] = Φ40 + Φ9 + ΦE
• Φ40: Discrepancy term due to choosing first 40 sub-volumes from full 512sub-volumes. Assess this by repeating 100 runs but now choosing 40random regions. More advanced: exchangeable models paper.
• Φ9: As we have initially neglected 9 parameters (due to expert advice) weneed to assess effect of this (by running latin hypercube design across all17 parameters)
• ΦE : Expert assessment of model discrepancy of full model with 17parameters and using 512 sub-volumes
It is straightforward to find the multivariate expressions for Φ40 and Φ9, butΦE requires more careful thought.
Model Discrepancy: Subjective ΦE
UQ12 minitutorial - session 6 32 / 68
• Experts assert that there are clear ways that the model could be defective.
• Model predicts too many (or too few) galaxies. This would lead to ahighly correlated model discrepancy across all outputs.
• Model systematically gets the colours of galaxies wrong: results in too few(too many) blue galaxies and too many (too few) red galaxies. Givesnegatively correlated model discrepancy between outputs from differentcoloured (bj and K) luminosity graphs.
• We therefore assume the model discrepancy term ΦE has the form:
ΦE = a
1 b .. c .. cb 1 .. c . c: : : : : :c .. c 1 b ..c .. c b 1 ..: : : : : :
• Obtain values for a, b and c from expert assessment.
Expert Assessment of ΦE: Elicitation Tool
UQ12 minitutorial - session 6 33 / 68
• We obtain expert assessments of a, b and c using an elicitation tool.
Observational Errors
UQ12 minitutorial - session 6 34 / 68
Observational Errors Var[e] are composed of 4 parts:
• Normalisation Error: correlated vertical error on all luminosity outputpoints
• Luminostiy Zero Point Error: correlated horizontal error on all luminositypoints
• k + e Correction Error: Outputs have to be corrected for the fact thatgalaxies are moving away from us at different speeds (light is red-shifted),and for the fact that galaxies are seen in the past (as light takes millionsof years to reach us)
• Galaxy Production Error: assumed Poisson process to describe galaxyproduction
The multivariate form for each of these quantities is straightforward(!) tocalculate.
Implausibility Measures (Univariate)
UQ12 minitutorial - session 6 35 / 68
We can now calculate the Implausibility I(i)(x) at any input parameter pointx for each of the i = 1, .., 11 outputs. This is given by:
I2(i)(x) =
|E[fi(x)] − zi|2
(Var[fi(x)] + Var[di] + Var[ei])
Implausibility Measures (Univariate)
UQ12 minitutorial - session 6 35 / 68
We can now calculate the Implausibility I(i)(x) at any input parameter pointx for each of the i = 1, .., 11 outputs. This is given by:
I2(i)(x) =
|E[fi(x)] − zi|2
(Var[fi(x)] + Var[di] + Var[ei])
• E[fi(x)] and Var[fi(x)] are the emulator expectation and variance.
Implausibility Measures (Univariate)
UQ12 minitutorial - session 6 35 / 68
We can now calculate the Implausibility I(i)(x) at any input parameter pointx for each of the i = 1, .., 11 outputs. This is given by:
I2(i)(x) =
|E[fi(x)] − zi|2
(Var[fi(x)] + Var[di] + Var[ei])
• E[fi(x)] and Var[fi(x)] are the emulator expectation and variance.
• zi are the observed data and Var[di] and Var[ei] are the (univariate)Model Discrepancy and Observational Error variances.
Implausibility Measures (Univariate)
UQ12 minitutorial - session 6 35 / 68
We can now calculate the Implausibility I(i)(x) at any input parameter pointx for each of the i = 1, .., 11 outputs. This is given by:
I2(i)(x) =
|E[fi(x)] − zi|2
(Var[fi(x)] + Var[di] + Var[ei])
• E[fi(x)] and Var[fi(x)] are the emulator expectation and variance.
• zi are the observed data and Var[di] and Var[ei] are the (univariate)Model Discrepancy and Observational Error variances.
• Large values of I(i)(x) imply that we are highly unlikely to obtainacceptable matches between model output and observed data atinput x.
Implausibility Measures (Univariate)
UQ12 minitutorial - session 6 35 / 68
We can now calculate the Implausibility I(i)(x) at any input parameter pointx for each of the i = 1, .., 11 outputs. This is given by:
I2(i)(x) =
|E[fi(x)] − zi|2
(Var[fi(x)] + Var[di] + Var[ei])
• E[fi(x)] and Var[fi(x)] are the emulator expectation and variance.
• zi are the observed data and Var[di] and Var[ei] are the (univariate)Model Discrepancy and Observational Error variances.
• Large values of I(i)(x) imply that we are highly unlikely to obtainacceptable matches between model output and observed data atinput x.
• Small values of I(i)(x) do not imply that x is good!
Implausibility Measures (Univariate)
UQ12 minitutorial - session 6 36 / 68
• We can combine the univariate implausibilities across the 11 outputs bymaximizing over outputs:
IM (x) = maxi I(i)(x)
Implausibility Measures (Univariate)
UQ12 minitutorial - session 6 36 / 68
• We can combine the univariate implausibilities across the 11 outputs bymaximizing over outputs:
IM (x) = maxi I(i)(x)
• We can then impose a cutoff IM (x) < cM in order to discard regionsof input parameter space that we now deem to be implausible.
Implausibility Measures (Univariate)
UQ12 minitutorial - session 6 36 / 68
• We can combine the univariate implausibilities across the 11 outputs bymaximizing over outputs:
IM (x) = maxi I(i)(x)
• We can then impose a cutoff IM (x) < cM in order to discard regionsof input parameter space that we now deem to be implausible.
• The choice of cutoff cM is often motivated by Pukelsheim’s 3-sigma rule.
Implausibility Measures (Univariate)
UQ12 minitutorial - session 6 36 / 68
• We can combine the univariate implausibilities across the 11 outputs bymaximizing over outputs:
IM (x) = maxi I(i)(x)
• We can then impose a cutoff IM (x) < cM in order to discard regionsof input parameter space that we now deem to be implausible.
• The choice of cutoff cM is often motivated by Pukelsheim’s 3-sigma rule.
• We may simultaneously employ other choices of implausibility measure:e.g. multivariate, second maximum etc.
Multivariate Implausibility Measure
UQ12 minitutorial - session 6 37 / 68
• As we have constructed a multivariate model discrepancy, we can define amultivariate Implausibility measure:
I2(x) = (E[f(x)] − z)T Var[f(x) − z]−1(E[f(x)] − z),
which becomes:
I2(x) = (E[f(x)]− z)T (Var[f(x)] + Var[d] + Var[e])−1(E[f(x)] − z)
• where Var[f(x)], Var[d] and Var[e] are now the multivariate emulatorvariance, multivariate model discrepancy and multivariate observationalerrors respectively (all 11×11 matrices).
• We now have two implausibility measures IM (x) and I(x) that we can useto reduce the input space.
• We impose suitable cutoffs on each measure to define a smaller set ofnon-implausible inputs.
History Matching via Implausibility: a 1D Example
UQ12 minitutorial - session 6 38 / 68
History Matching via Implausibility: a 1D Example
UQ12 minitutorial - session 6 39 / 68
History Matching via Implausibility: a 1D Example
UQ12 minitutorial - session 6 40 / 68
History Matching via Implausibility: a 1D Example
UQ12 minitutorial - session 6 41 / 68
History Matching via Implausibility: a 1D Example
UQ12 minitutorial - session 6 42 / 68
History Matching via Implausibility: a 1D Example
UQ12 minitutorial - session 6 43 / 68
Iterative Refocussing Strategy for Reducing Input Space.
UQ12 minitutorial - session 6 44 / 68
We use an iterative strategy to reduce the input parameter space. Denotingthe current non-implausible volume by Xj , at each stage or wave we:
Iterative Refocussing Strategy for Reducing Input Space.
UQ12 minitutorial - session 6 44 / 68
We use an iterative strategy to reduce the input parameter space. Denotingthe current non-implausible volume by Xj , at each stage or wave we:
1. Design a set of runs over the non-implausible input region Xj
Iterative Refocussing Strategy for Reducing Input Space.
UQ12 minitutorial - session 6 44 / 68
We use an iterative strategy to reduce the input parameter space. Denotingthe current non-implausible volume by Xj , at each stage or wave we:
1. Design a set of runs over the non-implausible input region Xj
2. Construct new emulators for f(x) only over this region Xj
Iterative Refocussing Strategy for Reducing Input Space.
UQ12 minitutorial - session 6 44 / 68
We use an iterative strategy to reduce the input parameter space. Denotingthe current non-implausible volume by Xj , at each stage or wave we:
1. Design a set of runs over the non-implausible input region Xj
2. Construct new emulators for f(x) only over this region Xj
3. Evaluate the new implausibility function IM (x) over Xj
Iterative Refocussing Strategy for Reducing Input Space.
UQ12 minitutorial - session 6 44 / 68
We use an iterative strategy to reduce the input parameter space. Denotingthe current non-implausible volume by Xj , at each stage or wave we:
1. Design a set of runs over the non-implausible input region Xj
2. Construct new emulators for f(x) only over this region Xj
3. Evaluate the new implausibility function IM (x) over Xj
4. Define a new (reduced) non-implausible region Xj+1, by IM (x) < cM ,which should satisfy X ⊂ Xj+1 ⊂ Xj .
Iterative Refocussing Strategy for Reducing Input Space.
UQ12 minitutorial - session 6 44 / 68
We use an iterative strategy to reduce the input parameter space. Denotingthe current non-implausible volume by Xj , at each stage or wave we:
1. Design a set of runs over the non-implausible input region Xj
2. Construct new emulators for f(x) only over this region Xj
3. Evaluate the new implausibility function IM (x) over Xj
4. Define a new (reduced) non-implausible region Xj+1, by IM (x) < cM ,which should satisfy X ⊂ Xj+1 ⊂ Xj .
This algorithm is continued until a) we run out of computational resources, orb) the emulators are found to be of sufficient accuracy compared to the otheruncertainties present (model discrepancy and observational errors).
History Matching via Implausibility: 1D Example - Wave 1
UQ12 minitutorial - session 6 45 / 68
History Matching via Implausibility: 1D Example - Wave 1
UQ12 minitutorial - session 6 46 / 68
History Matching via Implausibility: 1D Example
UQ12 minitutorial - session 6 47 / 68
History Matching via Implausibility: 1D Example
UQ12 minitutorial - session 6 48 / 68
History Matching via Implausibility: 1D Example - Wave 2
UQ12 minitutorial - session 6 49 / 68
Galform: Visualizing the non-implausible volumes Xj
UQ12 minitutorial - session 6 50 / 68
• Plotting the non-implausible region Xj in 1 dimension is trivial, but howdo we view a 17-dimensional Xj?
Galform: Visualizing the non-implausible volumes Xj
UQ12 minitutorial - session 6 50 / 68
• Plotting the non-implausible region Xj in 1 dimension is trivial, but howdo we view a 17-dimensional Xj?
• Even though our emulators are very fast to evaluate, we still cannot coverthe 17-dimensional space with a simple grid design.
Galform: Visualizing the non-implausible volumes Xj
UQ12 minitutorial - session 6 50 / 68
• Plotting the non-implausible region Xj in 1 dimension is trivial, but howdo we view a 17-dimensional Xj?
• Even though our emulators are very fast to evaluate, we still cannot coverthe 17-dimensional space with a simple grid design.
• We instead use efficient emulator designs, developed specifically forconstructing lower dimensional projections, e.g.
Galform: Visualizing the non-implausible volumes Xj
UQ12 minitutorial - session 6 50 / 68
• Plotting the non-implausible region Xj in 1 dimension is trivial, but howdo we view a 17-dimensional Xj?
• Even though our emulators are very fast to evaluate, we still cannot coverthe 17-dimensional space with a simple grid design.
• We instead use efficient emulator designs, developed specifically forconstructing lower dimensional projections, e.g.
• 2-Dimensional Projection: for each pair of active variables we evaluatethe emulator on a large (2D grid) x (15D latin hypercube) design.
Galform: Visualizing the non-implausible volumes Xj
UQ12 minitutorial - session 6 50 / 68
• Plotting the non-implausible region Xj in 1 dimension is trivial, but howdo we view a 17-dimensional Xj?
• Even though our emulators are very fast to evaluate, we still cannot coverthe 17-dimensional space with a simple grid design.
• We instead use efficient emulator designs, developed specifically forconstructing lower dimensional projections, e.g.
• 2-Dimensional Projection: for each pair of active variables we evaluatethe emulator on a large (2D grid) x (15D latin hypercube) design.
• Very fast emulator design as we can take several algebraic shortcuts due tosymmetries of emulator update: approximately 5 times faster.
Galform: Visualizing the non-implausible volumes Xj
UQ12 minitutorial - session 6 50 / 68
• Plotting the non-implausible region Xj in 1 dimension is trivial, but howdo we view a 17-dimensional Xj?
• Even though our emulators are very fast to evaluate, we still cannot coverthe 17-dimensional space with a simple grid design.
• We instead use efficient emulator designs, developed specifically forconstructing lower dimensional projections, e.g.
• 2-Dimensional Projection: for each pair of active variables we evaluatethe emulator on a large (2D grid) x (15D latin hypercube) design.
• Very fast emulator design as we can take several algebraic shortcuts due tosymmetries of emulator update: approximately 5 times faster.
• Using this, we can now produce 2D Minimised ImplausibilityProjections and Optical Depth Plots.
2D Minimised Implausibility Projections: Wave 1
UQ12 minitutorial - session 6 51 / 68
• Minimised Implausibility Projections: at each 2D grid point, minimisethe implausibility IM (x) over the 15D hypercube.
2D Minimised Implausibility Projections: Wave 1
UQ12 minitutorial - session 6 51 / 68
• Minimised Implausibility Projections: at each 2D grid point, minimisethe implausibility IM (x) over the 15D hypercube.
• If a point on these plots is implausible (coloured red), then it will beimplausible for any choice of the 15 other inputs.
2D Minimised Implausibility Projections: Wave 1
UQ12 minitutorial - session 6 51 / 68
• Minimised Implausibility Projections: at each 2D grid point, minimisethe implausibility IM (x) over the 15D hypercube.
• If a point on these plots is implausible (coloured red), then it will beimplausible for any choice of the 15 other inputs.
• If a point is green, it may or may not prove to be an acceptable input.
2D Min Implausibility Projections: Wave 1 to 4 (0.12%)
UQ12 minitutorial - session 6 52 / 68
2D Min Implausibility Projections: Wave 1 to 4 (0.12%)
UQ12 minitutorial - session 6 52 / 68
2D Min Implausibility Projections: Wave 1 to 4 (0.12%)
UQ12 minitutorial - session 6 52 / 68
2D Min Implausibility Projections: Wave 1 to 4 (0.12%)
UQ12 minitutorial - session 6 52 / 68
2D Min Implausibility Projections: Wave 1 to 4 (0.12%)
UQ12 minitutorial - session 6 53 / 68
2D Min Implausibility Projections: Wave 1 to 4 (0.12%)
UQ12 minitutorial - session 6 53 / 68
2D Min Implausibility Projections: Wave 1 to 4 (0.12%)
UQ12 minitutorial - session 6 53 / 68
2D Min Implausibility Projections: Wave 1 to 4 (0.12%)
UQ12 minitutorial - session 6 53 / 68
2D Optical Depth Plots: Wave 2
UQ12 minitutorial - session 6 54 / 68
• Optical Depth Plots: at each 2D grid point plot the proportion of the15D latin hypercube points that survive the cutoff IM (x) < cM .
2D Optical Depth Plots: Wave 2
UQ12 minitutorial - session 6 54 / 68
• Optical Depth Plots: at each 2D grid point plot the proportion of the15D latin hypercube points that survive the cutoff IM (x) < cM .
• These plots show the ‘depth’ of the non-implausible volume Xj for wave j,at each grid point.
2D Optical Depth Plots: Wave 2
UQ12 minitutorial - session 6 54 / 68
• Optical Depth Plots: at each 2D grid point plot the proportion of the15D latin hypercube points that survive the cutoff IM (x) < cM .
• These plots show the ‘depth’ of the non-implausible volume Xj for wave j,at each grid point.
• Shows where the majority of non-implausible points can be found, but notnecessarily where the best matches are.
2D Optical Depth Plots: Wave 1 to Wave 4
UQ12 minitutorial - session 6 55 / 68
2D Optical Depth Plots: Wave 1 to Wave 4
UQ12 minitutorial - session 6 55 / 68
2D Optical Depth Plots: Wave 1 to Wave 4
UQ12 minitutorial - session 6 55 / 68
2D Optical Depth Plots: Wave 1 to Wave 4
UQ12 minitutorial - session 6 55 / 68
2D Implausibility Projections: Stage 4 (0.12%)
UQ12 minitutorial - session 6 56 / 68
Why Does Iterative Refocussing Work?
UQ12 minitutorial - session 6 57 / 68
Why do we reduce space in waves? Why not attempt to do it all at once?
Why Does Iterative Refocussing Work?
UQ12 minitutorial - session 6 57 / 68
Why do we reduce space in waves? Why not attempt to do it all at once?Because this requires an accurate emulator valid over whole input space.
• In contrast, the iterative approach is far more efficient.
Why Does Iterative Refocussing Work?
UQ12 minitutorial - session 6 57 / 68
Why do we reduce space in waves? Why not attempt to do it all at once?Because this requires an accurate emulator valid over whole input space.
• In contrast, the iterative approach is far more efficient.
• At each wave the emulators are found to be significantly more accurate (inthat Var[f(x)] becomes smaller). This is expected as:
1. We have ‘zoomed in’ on a smaller part of the function, it will besmoother and most likely easier to fit with low order polynomials.
Why Does Iterative Refocussing Work?
UQ12 minitutorial - session 6 57 / 68
Why do we reduce space in waves? Why not attempt to do it all at once?Because this requires an accurate emulator valid over whole input space.
• In contrast, the iterative approach is far more efficient.
• At each wave the emulators are found to be significantly more accurate (inthat Var[f(x)] becomes smaller). This is expected as:
1. We have ‘zoomed in’ on a smaller part of the function, it will besmoother and most likely easier to fit with low order polynomials.
2. We have a much higher density of runs in the new volume, and hencethe Gaussian process part of the emulator will do more work.
Why Does Iterative Refocussing Work?
UQ12 minitutorial - session 6 57 / 68
Why do we reduce space in waves? Why not attempt to do it all at once?Because this requires an accurate emulator valid over whole input space.
• In contrast, the iterative approach is far more efficient.
• At each wave the emulators are found to be significantly more accurate (inthat Var[f(x)] becomes smaller). This is expected as:
1. We have ‘zoomed in’ on a smaller part of the function, it will besmoother and most likely easier to fit with low order polynomials.
2. We have a much higher density of runs in the new volume, and hencethe Gaussian process part of the emulator will do more work.
3. We can identify more active variables, leading to more detailedpolynomial and Gaussian process parts of the emulator, as previouslydominant variables are now somewhat suppressed.
Why Does Iterative Refocussing Work?
UQ12 minitutorial - session 6 57 / 68
Why do we reduce space in waves? Why not attempt to do it all at once?Because this requires an accurate emulator valid over whole input space.
• In contrast, the iterative approach is far more efficient.
• At each wave the emulators are found to be significantly more accurate (inthat Var[f(x)] becomes smaller). This is expected as:
1. We have ‘zoomed in’ on a smaller part of the function, it will besmoother and most likely easier to fit with low order polynomials.
2. We have a much higher density of runs in the new volume, and hencethe Gaussian process part of the emulator will do more work.
3. We can identify more active variables, leading to more detailedpolynomial and Gaussian process parts of the emulator, as previouslydominant variables are now somewhat suppressed.
• This is a major strength of the History Matching approach.
Visualisation: Fast Approximate Emulators
UQ12 minitutorial - session 6 58 / 68
The emulation approach we have taken, that of building substantial structurein the polynomial part of the emulator, allows the construction of fastapproximations to the emulators.
• If we take the regression part of the wave 4 emulator only:
fi(x) =∑
j
βij gij(xA) + wi,
and assume Var[wi] = α2(σ2ui
+ σ2δi
) for some conservative choice ofα > 1, we can then use this fast approximate emulator to screen allcandidate input points.
• Using this method we were able to reduce the input space to 1.2% of itsoriginal volume.
• We then only had to use the full, slower wave 1 to 4 emulators on thesurviving points. Used to generate higher dimensional figures.
3D Minimised Implausibility and Optical Depth Plots
UQ12 minitutorial - session 6 59 / 68
• 3D projections created using the Fast Approximate Emulator approach.
4-Dimensional Implausibility Plots: Anyone?
UQ12 minitutorial - session 6 60 / 68
4-Dimensional Implausibility Plots: Anyone?
UQ12 minitutorial - session 6 61 / 68
2D Implausibility Projections: Stage 4 (0.12%)
UQ12 minitutorial - session 6 62 / 68
Wave 5 runs
UQ12 minitutorial - session 6 63 / 68
bj Luminosity Output of Waves 1,2,3 and 5
UQ12 minitutorial - session 6 64 / 68
bj Luminosity Output of Waves 1,2,3 and 5
UQ12 minitutorial - session 6 64 / 68
bj Luminosity Output of Waves 1,2,3 and 5
UQ12 minitutorial - session 6 64 / 68
bj Luminosity Output of Waves 1,2,3 and 5
UQ12 minitutorial - session 6 64 / 68
bj Luminosity Output of Waves 1,2,3 and 5
UQ12 minitutorial - session 6 65 / 68
bj Luminosity Output of Waves 1,2,3 and 5
UQ12 minitutorial - session 6 65 / 68
bj Luminosity Output of Waves 1,2,3 and 5
UQ12 minitutorial - session 6 65 / 68
bj Luminosity Output of Waves 1,2,3 and 5
UQ12 minitutorial - session 6 65 / 68
Conclusions and Further Issues
UQ12 minitutorial - session 6 66 / 68
• History Matching using iterative refocussing: very efficient technique forlearning about the set of acceptable inputs X .
Conclusions and Further Issues
UQ12 minitutorial - session 6 66 / 68
• History Matching using iterative refocussing: very efficient technique forlearning about the set of acceptable inputs X .
• Often more appropriate than a fully probabilistic calibration, or should atthe very least precede calibration.
Conclusions and Further Issues
UQ12 minitutorial - session 6 66 / 68
• History Matching using iterative refocussing: very efficient technique forlearning about the set of acceptable inputs X .
• Often more appropriate than a fully probabilistic calibration, or should atthe very least precede calibration.
• We have developed novel methods to visualise X that exploit the structureof the emulators.
Conclusions and Further Issues
UQ12 minitutorial - session 6 66 / 68
• History Matching using iterative refocussing: very efficient technique forlearning about the set of acceptable inputs X .
• Often more appropriate than a fully probabilistic calibration, or should atthe very least precede calibration.
• We have developed novel methods to visualise X that exploit the structureof the emulators.
• We now have a large set of acceptable (Wave 5) runs that can be analysedby the Cosmologists, and used to explore other features of Galform.
Conclusions and Further Issues
UQ12 minitutorial - session 6 67 / 68
• Future work: beginning to explore the more advanced Galform 2 model:more galaxies, longer run time, far more outputs to match anduncertainties to assess.
Conclusions and Further Issues
UQ12 minitutorial - session 6 67 / 68
• Future work: beginning to explore the more advanced Galform 2 model:more galaxies, longer run time, far more outputs to match anduncertainties to assess.
Vernon, I., Goldstein, M., and Bower, R. (2010), “Galaxy Formation: a
Bayesian Uncertainty Analysis”, Bayesian Analysis, 5(4): 619–670. Inviteddiscussion paper. MUCM Technical Report 10/03.
Conclusions and Further Issues
UQ12 minitutorial - session 6 67 / 68
• Future work: beginning to explore the more advanced Galform 2 model:more galaxies, longer run time, far more outputs to match anduncertainties to assess.
Vernon, I., Goldstein, M., and Bower, R. (2010), “Galaxy Formation: a
Bayesian Uncertainty Analysis”, Bayesian Analysis, 5(4): 619–670. Inviteddiscussion paper. MUCM Technical Report 10/03.
Bower, R., Vernon, I., Goldstein, M., et al. (2010), “The Parameter Space of
Galaxy Formation”, Mon.Not.Roy.Astron.Soc., 407: 2017–2045. MUCMTechnical Report 10/02.
Conclusions and Further Issues
UQ12 minitutorial - session 6 67 / 68
• Future work: beginning to explore the more advanced Galform 2 model:more galaxies, longer run time, far more outputs to match anduncertainties to assess.
Vernon, I., Goldstein, M., and Bower, R. (2010), “Galaxy Formation: a
Bayesian Uncertainty Analysis”, Bayesian Analysis, 5(4): 619–670. Inviteddiscussion paper. MUCM Technical Report 10/03.
Bower, R., Vernon, I., Goldstein, M., et al. (2010), “The Parameter Space of
Galaxy Formation”, Mon.Not.Roy.Astron.Soc., 407: 2017–2045. MUCMTechnical Report 10/02.
— History Matching now available on the MUCM toolkit —
References
UQ12 minitutorial - session 6 68 / 68
P.S. Craig, M. Goldstein, A.H. Seheult, J.A. Smith (1997). Pressure matchingfor hydocarbon reservoirs: a case study in the use of Bayes linear strategiesfor large computer experiments (with discussion), in Case Studies in BayesianStatistics, vol. III, eds. C. Gastonis et al. 37-93. Springer-Verlag.
M. Goldstein and J.C.Rougier (2008). Reified Bayesian modelling andinference for physical systems (with discussion), JSPI, to appear, .
Kennedy, M.C. and O’Hagan, A. (2001). Bayesian calibration of computermodels (with discussion). Journal of the Royal Statistical Society, B,63,425-464
Santner, T., Williams, B. and Notz, W. (2003). The Design and Analysis ofComputer Experiments. Springer Verlag: New York.
Bower, R.G., Benson, A. J. et.al.(2006). The Broken hierarchy of galaxyformation, Mon.Not.Roy.Astron.Soc. 370, 645-655