minimal-dispersion and maximum- likelihood predictors with ... · – minimal dispersion •...
TRANSCRIPT
Luis G. Crespo, Sean P. Kenny, and Daniel P. Giesy
Dynamic Systems and Controls BranchNASA Langley Research Center, Hampton, VA, 23666
Minimal-Dispersion and Maximum-Likelihood Predictors with a Linear Staircase
Structure
2
Outline
• Problem statement • Background • Random predictor models • Conclusions
3
Problem Statement• Goal: create a computational model of a Data Generating
Mechanism (DGM) given N input-output pairs D={x(i), y(i)}
4 DGM is a deterministic function of 2 inputs without noise
Problem Statement: On the DGM
5 Model form uncertainty vs. deterministic function + colored noise
Problem Statement: On the DGM
6
Problem Statement• Parametric models vs. non-parametric models • This paper focuses on the parametric model
• This form is implied by the superposition property of linear system theory
• The calibration problem of interest is not standard since the calibrated variable is unobservable
7
Outline
• Problem statement • Computational models • Staircase variables • Random predictor models • Conclusions
8
Computational Models• Interval Predictor Models (IPM)
9
Interval Predictor Models• The output is an interval valued function of the input • IPM considered here are given by
• This leads to
where the IPM boundaries are known analytically • Interval and functional representation • The spread of the IPM is
10
Interval Predictor Models• IPMs are calculated by solving the convex program
Additional set of constraints
11
Interval Predictor Models
np=10 N=1K
12
Interval Predictor Models
np=20 N=1K
13
Interval Predictor Models: Reliability• Reliability of the Predictor: scenario theory enables
bounding the probability of a future observation falling outside the IPM: distribution-free, non-asymptotic
• This is a probabilistic certificate of correctness prescribing the interplay between the amount of information available, the complexity of the model, a confidence parameter, and the reliability of the model
14
Computational Models• Random Predictor Models (RPM)
15
Maximal-entropy Staircase RPM
The modality and skewness vary strongly with the input
Random Predictor Models
16
Outline
• Problem statement • Computational models • Staircase variables • Random predictor models • Conclusions
17
Background
• Hyper-parameters
• Desired variables must match these constraints • Only some are feasible • Polynomial feasibility constraints:
18
Background: 𝛉-Feasibility Equations
Distribution free
19
• A staircase random variables has a piecewise constant density function over a uniform partition of the domain that match the constraints imposed by
• Staircases are found by solving the convex program
Staircase Variables
-1 -0.5 0 0.5 1
f z
0
0.5
1
-1 -0.5 0 0.5 1
f z
0
0.5
1
z-1 -0.5 0 0.5 1
f z
0
0.5
1
20
• Able to represent a wide range of density shapes by using different optimality criteria – Max entropy – Max likelihood – Max degree of unimodality, etc
• Able to represent most of the feasible space • Low-computational cost: from convex optimization
Staircase Variables: Key Attributes
21
Outline
• Problem statement • Computational models • Staircase variables • Random predictor models
– Moment-matching – Minimal dispersion
• Conclusions
22
Random Predictor Models• The output is a random process • RPM considered here are given by
• Goal: given the data sequence D={x(i), y(i)} we want to characterize the distribution of p
x
y ⨯
⨯ ⨯
p1
p2
?
23
Random Predictor Models• Bayesian/Maximum likelihood approach
– Pros: any model, any distribution – Cons: expensive, tight to assumed distribution
x
y ⨯
⨯ ⨯
p1
p2
24
Random Predictor Models• Taking the expected value of the model equation we have
which can be combined to obtain the moment functions
Parameter independency is assumed hereafter
25
Moment-Matching RPMs• Idea: Find the moments of p leading to a prediction that
minimizes the offset between the predicted moments and the empirical moments
• A sliding-window approach is used to estimate the empirical moments
• The predicted moments, given by
depend upon the design variables:
26
Moment-Matching RPMs• Solution Approach: a sequence of optimization programs for
moments of increasing order. 1. Solve for the mean
2. Find a feasible support set using IPMs 3. Solve for the variance
for
4. Find a feasible support set using IPMs….
27
Moment-Matching RPMs• Outcome:
• Advantage: approach is distribution-free: no need to assume a distribution for p upfront
• Setting a particular uncertainty model: use staircase variables to realize the optimal moments
28
Moment-Matching Example• Goal: to characterize the unknown loading of a cantilever
beam from displacement measurements • A datum in the sequence is a set of measurements
29
Moment-Matching Example• Basis chosen from Euler-beam theory:
30
Moment-Matching Example
Skewed response
31
Moment-Matching Example
32
Minimal-Dispersion RPM• Idea: find the moments of p leading to a prediction that
concentrates the response as close as possible to the data while enclosing it into a high-probability region (trade-off)
• Solution approach: solve the optimization program
where
and the high-probability region is
33
Minimal-Dispersion RPM• Same outcome and advantage as the previous approach • When to use: unimodal DGM • Challenge: characterizing 𝛪𝛂 as a function of 𝛉 𝛂 as a function of 𝛉 as a function of 𝛉 In the paper we use:
but a better 𝛪𝛂 can be derived using regression/staircases 𝛂 can be derived using regression/staircases can be derived using regression/staircases
34
Minimal-Dispersion: Example• Consider the data-cloud, and an arbitrary basis
0 0.2 0.4 0.6 0.8 1x
-5
-4
-3
-2
-1
0
1
2
3
4
y
DataIPM
35
Minimal-Dispersion: Example• Resulting RPM
36
Minimal-Dispersion: Example• Distribution of the staircase parameters
p10 0.2 0.4 0.6
f p1
0
2
p2-0.2 0 0.2 0.4 0.6
f p2
00.51
p30 0.5 1
f p30
0.5
p40 0.5 1
f p4
0
1
p50 0.5 1 1.5
f p5
0
0.5
p60 0.5 1 1.5
f p6
0
0.5
p70 0.5 1
f p7
00.51
p80 0.5 1
f p8
00.51
p9-0.1 0 0.1 0.2 0.3 0.4
f p9
0
2
p10-0.2 0 0.2 0.4 0.6 0.8
f p10
00.51
37
Conclusions
• A framework for calibrating affine probabilistic models was developed
• Technique is moment-based and distribution-free • Computational demands are considerably lower than
maximum/likelihood based approaches • Eliminates the need for assuming a distribution of the
uncertainty upfront • Analytical propagation of moments is possible when
dependency is a known polynomial (we only did linear)
38
Conclusions
• Parameter dependencies can be accounted for (not done here, cumbersome)
• All sources of uncertainty and error are lumped into the resulting characterization of p...
Luis G. Crespo, Sean P. Kenny, and Daniel P. Giesy
Dynamic Systems and Controls BranchNASA Langley Research Center, Hampton, VA, 23666
Random Predictor Models with a Linear Staircase Structure
40
Staircases
• Consider a random variable z with probability density function (PDF) and support set
• The central moments, defined as
are assumed to exist • Goal: to calculate a random variable with a bounded
support given values for the first four moments
41
𝛉-Feasibility-Feasibility
• Does there exist a random variable that meets the constraints imposed by ?
• Distribution-free vs. distribution fixed • Such a random variable(s) exist if the set of
polynomial constraints is satisfied
[1] - Sharma 2015, ArXiv 1503.03786; Kumar 2012
42
𝛉-Feasibility: equations-Feasibility: equations
43
𝛉-Feasibility-Feasibility
• Feasible domain
44
𝛉-Feasibility: intersections-Feasibility: intersections
45
𝛉-Feasibility-Feasibility
• Feasible domain
• This set is non-convex • Standard random variables cannot realize most of Θ
46
𝛉-Feasibility: intersections-Feasibility: intersections
47
𝛉-Feasibility-Feasibility
• Feasible domain
• This set is non-convex • Standard random variables cannot realize most of Θ • There might exist infinitely many random variables
able to realize a feasible point
48
𝛉-Feasibility-Feasibility
• Feasible domain
• This set is non-convex • Standard random variables cannot realize most of Θ • There might exist infinitely many random variables
able to realize a feasible point • How to construct a family of random variables that
can realize most of Θ?
49
Staircase random variables
• Staircase variables have a piecewise constant PDF over a uniform partition of : nb bins
• The PDF of a staircase variable is given by
where is given by
50
Staircase random variables
• Cost to be defined later • Hyper-parameter: • The above equation can be written as
51
Staircase random variables
• If the cost function is convex, calculating a staircase variable entails solving a convex optimization program: efficiently done for hundreds of thousands of constraints/design variables
• This optimization problem might be infeasible: distribution-fixed
52
Staircase variables: cost function
• Does not affect staircase-feasibility • Three classes considered
– Maximal entropy
– Minimal squared likelihood – Optimal target matching
• Other costs: max/min likelihood, min support, etc. • Let’s explore their structure and dependencies
53
Staircase random variables: nb
-1 -0.5 0 0.5 1f z
0
0.5
1
-1 -0.5 0 0.5 1
f z
0
0.5
1
z-1 -0.5 0 0.5 1
f z
0
0.5
1
54
Staircase random variables: cost J
z-1 -0.5 0 0.5 1
f z
0
0.2
0.4
0.6
0.8
1
1.2 DGMMax EntropyMin Square LikelihoodPDF matching
55
Staircase variables: worst-case variable
F
z
Pbox(J|θ)
1
56
Staircase variables: worst-case PDF
F
z
P[ z > z1 | θ ] = [𝛂, 𝛃] 1
z1
𝛂
𝛃
57
Staircase variables: feasibility
• The staircase feasible space is defined as
• How much of Θ can staircase variables represent?
nb=10
Θ
Θ\S m
3
m2 m2 m2
nb=10
Θ S
Θ\S m
3
m2 m2 m2
nb=10
Θ S
Θ\S m
3
m2 m2 m2
nb=10
nb=100
nb=200
Θ
Θ
Θ
S
S
S
Θ\S
Θ\S
Θ\S
m3
m3
m3
m2 m2 m2
62
Staircase variables: feasibility
63
Setting Target Functions
64
Setting Target Functions
65
Setting Target Functions
66
Setting Target Functions
67
-3 -2 -1 0 1 2 3x
0
0.5
1
1.5
2
y
True values Approximation
True Moments vs. Approximation
z
z
µ
m2
m3
m4