minimal-dispersion and maximum- likelihood predictors with ... · – minimal dispersion •...

Luis G. Crespo, Sean P. Kenny, and Daniel P. Giesy

Dynamic Systems and Controls BranchNASA Langley Research Center, Hampton, VA, 23666

Minimal-Dispersion and Maximum-Likelihood Predictors with a Linear Staircase

Structure

2

Outline

•  Problem statement •  Background •  Random predictor models •  Conclusions

3

Problem Statement•  Goal: create a computational model of a Data Generating

Mechanism (DGM) given N input-output pairs D={x(i), y(i)}

4 DGM is a deterministic function of 2 inputs without noise

Problem Statement: On the DGM

5 Model form uncertainty vs. deterministic function + colored noise

Problem Statement: On the DGM

6

Problem Statement•  Parametric models vs. non-parametric models •  This paper focuses on the parametric model

•  This form is implied by the superposition property of linear system theory

•  The calibration problem of interest is not standard since the calibrated variable is unobservable

7

Outline

•  Problem statement •  Computational models •  Staircase variables •  Random predictor models •  Conclusions

8

Computational Models•  Interval Predictor Models (IPM)

9

Interval Predictor Models•  The output is an interval valued function of the input •  IPM considered here are given by

•  This leads to

where the IPM boundaries are known analytically •  Interval and functional representation •  The spread of the IPM is

10

Interval Predictor Models•  IPMs are calculated by solving the convex program

Additional set of constraints

11

Interval Predictor Models

np=10 N=1K

12

Interval Predictor Models

np=20 N=1K

13

Interval Predictor Models: Reliability•  Reliability of the Predictor: scenario theory enables

bounding the probability of a future observation falling outside the IPM: distribution-free, non-asymptotic

•  This is a probabilistic certificate of correctness prescribing the interplay between the amount of information available, the complexity of the model, a confidence parameter, and the reliability of the model

14

Computational Models•  Random Predictor Models (RPM)

15

Maximal-entropy Staircase RPM

The modality and skewness vary strongly with the input

Random Predictor Models

16

Outline

•  Problem statement •  Computational models •  Staircase variables •  Random predictor models •  Conclusions

17

Background

•  Hyper-parameters

•  Desired variables must match these constraints •  Only some are feasible •  Polynomial feasibility constraints:

18

Background: 𝛉-Feasibility Equations

Distribution free

19

•  A staircase random variables has a piecewise constant density function over a uniform partition of the domain that match the constraints imposed by

•  Staircases are found by solving the convex program

Staircase Variables

-1 -0.5 0 0.5 1

f z

0

0.5

1

-1 -0.5 0 0.5 1

f z

0

0.5

1

z-1 -0.5 0 0.5 1

f z

0

0.5

1

20

•  Able to represent a wide range of density shapes by using different optimality criteria –  Max entropy –  Max likelihood –  Max degree of unimodality, etc

•  Able to represent most of the feasible space •  Low-computational cost: from convex optimization

Staircase Variables: Key Attributes

21

Outline

•  Problem statement •  Computational models •  Staircase variables •  Random predictor models

–  Moment-matching –  Minimal dispersion

•  Conclusions

22

Random Predictor Models•  The output is a random process •  RPM considered here are given by

•  Goal: given the data sequence D={x(i), y(i)} we want to characterize the distribution of p

x

y ⨯

⨯ ⨯

p1

p2

?

23

Random Predictor Models•  Bayesian/Maximum likelihood approach

–  Pros: any model, any distribution –  Cons: expensive, tight to assumed distribution

x

y ⨯

⨯ ⨯

p1

p2

24

Random Predictor Models•  Taking the expected value of the model equation we have

which can be combined to obtain the moment functions

Parameter independency is assumed hereafter

25

Moment-Matching RPMs•  Idea: Find the moments of p leading to a prediction that

minimizes the offset between the predicted moments and the empirical moments

•  A sliding-window approach is used to estimate the empirical moments

•  The predicted moments, given by

depend upon the design variables:

26

Moment-Matching RPMs•  Solution Approach: a sequence of optimization programs for

moments of increasing order. 1.  Solve for the mean

2.  Find a feasible support set using IPMs 3.  Solve for the variance

for

4.  Find a feasible support set using IPMs….

27

Moment-Matching RPMs•  Outcome:

•  Advantage: approach is distribution-free: no need to assume a distribution for p upfront

•  Setting a particular uncertainty model: use staircase variables to realize the optimal moments

28

Moment-Matching Example•  Goal: to characterize the unknown loading of a cantilever

beam from displacement measurements •  A datum in the sequence is a set of measurements

29

Moment-Matching Example•  Basis chosen from Euler-beam theory:

30

Moment-Matching Example

Skewed response

31

Moment-Matching Example

32

Minimal-Dispersion RPM•  Idea: find the moments of p leading to a prediction that

concentrates the response as close as possible to the data while enclosing it into a high-probability region (trade-off)

•  Solution approach: solve the optimization program

where

and the high-probability region is

33

Minimal-Dispersion RPM•  Same outcome and advantage as the previous approach •  When to use: unimodal DGM •  Challenge: characterizing 𝛪𝛂 as a function of 𝛉 𝛂 as a function of 𝛉 as a function of 𝛉 In the paper we use:

but a better 𝛪𝛂 can be derived using regression/staircases 𝛂 can be derived using regression/staircases can be derived using regression/staircases

34

Minimal-Dispersion: Example•  Consider the data-cloud, and an arbitrary basis

0 0.2 0.4 0.6 0.8 1x

-5

-4

-3

-2

-1

0

1

2

3

4

y

DataIPM

35

Minimal-Dispersion: Example•  Resulting RPM

36

Minimal-Dispersion: Example•  Distribution of the staircase parameters

p10 0.2 0.4 0.6

f p1

0

2

p2-0.2 0 0.2 0.4 0.6

f p2

00.51

p30 0.5 1

f p30

0.5

p40 0.5 1

f p4

0

1

p50 0.5 1 1.5

f p5

0

0.5

p60 0.5 1 1.5

f p6

0

0.5

p70 0.5 1

f p7

00.51

p80 0.5 1

f p8

00.51

p9-0.1 0 0.1 0.2 0.3 0.4

f p9

0

2

p10-0.2 0 0.2 0.4 0.6 0.8

f p10

00.51

37

Conclusions

•  A framework for calibrating affine probabilistic models was developed

•  Technique is moment-based and distribution-free •  Computational demands are considerably lower than

maximum/likelihood based approaches •  Eliminates the need for assuming a distribution of the

uncertainty upfront •  Analytical propagation of moments is possible when

dependency is a known polynomial (we only did linear)

38

Conclusions

•  Parameter dependencies can be accounted for (not done here, cumbersome)

•  All sources of uncertainty and error are lumped into the resulting characterization of p...

Luis G. Crespo, Sean P. Kenny, and Daniel P. Giesy

Dynamic Systems and Controls BranchNASA Langley Research Center, Hampton, VA, 23666

Random Predictor Models with a Linear Staircase Structure

40

Staircases

•  Consider a random variable z with probability density function (PDF) and support set

•  The central moments, defined as

are assumed to exist •  Goal: to calculate a random variable with a bounded

support given values for the first four moments

41

𝛉-Feasibility-Feasibility

•  Does there exist a random variable that meets the constraints imposed by ?

•  Distribution-free vs. distribution fixed •  Such a random variable(s) exist if the set of

polynomial constraints is satisfied

[1] - Sharma 2015, ArXiv 1503.03786; Kumar 2012

42

𝛉-Feasibility: equations-Feasibility: equations

43


•  Feasible domain

44

𝛉-Feasibility: intersections-Feasibility: intersections

45



•  This set is non-convex •  Standard random variables cannot realize most of Θ

46

𝛉-Feasibility: intersections-Feasibility: intersections

47



•  This set is non-convex •  Standard random variables cannot realize most of Θ •  There might exist infinitely many random variables

able to realize a feasible point

48



•  This set is non-convex •  Standard random variables cannot realize most of Θ •  There might exist infinitely many random variables

able to realize a feasible point •  How to construct a family of random variables that

can realize most of Θ?

49

Staircase random variables

•  Staircase variables have a piecewise constant PDF over a uniform partition of : nb bins

•  The PDF of a staircase variable is given by

where is given by

50


•  Cost to be defined later •  Hyper-parameter: •  The above equation can be written as

51


•  If the cost function is convex, calculating a staircase variable entails solving a convex optimization program: efficiently done for hundreds of thousands of constraints/design variables

•  This optimization problem might be infeasible: distribution-fixed

52

Staircase variables: cost function

•  Does not affect staircase-feasibility •  Three classes considered

–  Maximal entropy

–  Minimal squared likelihood –  Optimal target matching

•  Other costs: max/min likelihood, min support, etc. •  Let’s explore their structure and dependencies

53

Staircase random variables: nb

-1 -0.5 0 0.5 1f z

0

0.5

1

-1 -0.5 0 0.5 1

f z

0

0.5

1

z-1 -0.5 0 0.5 1

f z

0

0.5

1

54

Staircase random variables: cost J

z-1 -0.5 0 0.5 1

f z

0

0.2

0.4

0.6

0.8

1

1.2 DGMMax EntropyMin Square LikelihoodPDF matching

55

Staircase variables: worst-case variable

F

z

Pbox(J|θ)

1

56

Staircase variables: worst-case PDF

F

z

P[ z > z1 | θ ] = [𝛂, 𝛃] 1

z1

𝛂

𝛃

57

Staircase variables: feasibility

•  The staircase feasible space is defined as

•  How much of Θ can staircase variables represent?

nb=10

Θ

Θ\S m

3

m2 m2 m2

nb=10

Θ S

Θ\S m

3

m2 m2 m2

nb=10

nb=100

nb=200

Θ

Θ

Θ

S

S

S

Θ\S

Θ\S

Θ\S

m3

m3

m3

m2 m2 m2

62

Staircase variables: feasibility

63

Setting Target Functions

64


65


66


67

-3 -2 -1 0 1 2 3x

0

0.5

1

1.5

2

y

True values Approximation

True Moments vs. Approximation

z

z

µ

m2

m3

m4

minimal-dispersion and maximum- likelihood predictors with ... · – minimal dispersion •...

Documents