maximium cmc

48
Bayesian Techniques for Parameter Estimation “He has Van Gogh’s ear for music,” Billy Wilder

Upload: marco-rodriguez-garcia

Post on 19-Feb-2016

256 views

Category:

Documents


0 download

DESCRIPTION

Bayes recursion method, Maximium Likelihood

TRANSCRIPT

Page 1: Maximium CMC

Bayesian Techniques for Parameter Estimation

“He has Van Gogh’s ear for music,” Billy Wilder

Page 2: Maximium CMC

Statistical Inference

Goal: The goal in statistical inference is to make conclusions about aphenomenon based on observed data.

Frequentist: Observations made in the past are analyzed with a specifiedmodel. Result is regarded as confidence about state of real world.

• Probabilities defined as frequencies with which an event occurs ifexperiment is repeated several times.

• Parameter Estimation:

o Relies on estimators derived from different data sets and a specificsampling distribution.

o Parameters may be unknown but are fixed

Bayesian: Interpretation of probability is subjective and can be updated withnew data.

• Parameter Estimation: Parameters described as density

Page 3: Maximium CMC

Bayesian Inference

Framework:

• Prior Distribution: Quantifies prior knowledge of parameter values.

• Likelihood: Probability of observing a data if we have a certain set ofparameter values.

• Posterior Distribution: Conditional probability distribution of unknownparameters given observed data.

Joint PDF: Quantifies all combination of data and observations

Bayes’ Relation: Specifies posterior in terms of likelihood, prior, andnormalization constant

Problem: Evaluation of normalization constant typically requires highdimensional integration.

Page 4: Maximium CMC

Bayesian Inference

Uninformative Prior: No a priori information parameters

Informative Prior: Use conjugate priors; prior and posterior from samedistribution

Evaluation Strategies:

• Analytic integration --- Rare

• Classical quadrature; e.g., p = 2

• Monte Carlo quadrature Techniques

• Markov Chains

Page 5: Maximium CMC

Bayesian Inference

Example:

Page 6: Maximium CMC

Bayesian Inference

Example:

1 Head, 0 Tails 5 Heads, 9 Tails 49 Heads, 51 Tails

Note:

Page 7: Maximium CMC

Bayesian Inference

Example: Now consider

Note: Poor informative prior incorrectly influences results for a long time.

50 Heads, 50 Tails5 Heads, 5 Tails

Page 8: Maximium CMC

Likelihood:

Assumption:

Note:

Parameter Estimation Problem

Page 9: Maximium CMC

Parameter Estimation: Example

Example: Consider the spring model

Page 10: Maximium CMC

Parameter Estimation: Example

Ordinary Least Squares: Here

Sample Distribution

Page 11: Maximium CMC

Parameter Estimation: Example

Bayesian Inference: The likelihood is

Strategy: Create Markov chain usingrandom sampling so that created chainhas the posterior distribution as itslimiting (stationary) distribution.

Posterior Distribution:

Page 12: Maximium CMC

Markov Chains

Definition:

Note: A Markov chain is characterized by three components: a state space, aninitial distribution, and a transition kernel.

State Space:

Initial Distribution: (Mass)

Transition Probability: (Markov Kernel)

Page 13: Maximium CMC

Markov Chains

Example:

Chapman-Kolmogorov Equations:

Page 14: Maximium CMC

Markov Chains: Limiting Distribution

Example: Raleigh weather -- Tomorrow’s weather conditioned on today’sweather

rain sun

Question:

Definition: This is the limiting distribution (invariant measure)

Page 15: Maximium CMC

Markov Chains: Limiting Distribution

Example: Raleigh weather

rain sunSolve

Page 16: Maximium CMC

Irreducible Markov Chains

Irreducible:

Reducible Markov Chain:

p1 p2

Note: Limiting distribution notunique if chain is reducible.

Page 17: Maximium CMC

Periodic Markov Chains

Example:

Periodicity: A Markov chain is periodic if parts of the state space are visited atregular intervals. The period k is defined as

Page 18: Maximium CMC

Periodic Markov Chains

Example:

Page 19: Maximium CMC

Stationary Distribution

Theorem: A finite, homogeneous Markov chain that is irreducible and aperiodichas a unique stationary distribution and the .chain will converge in the sense ofdistributions from any initial distribution .

Recurrence (Persistence):

Example: State 3 is transient

Ergodicity: A state is termed ergodic if it is aperiodic and recurrent. If all statesof an irreducible Markov chain are ergodic, the chain is said to be ergodic.

Page 20: Maximium CMC

Matrix Theory

Definition:

Lemma:

Example:

Page 21: Maximium CMC

Matrix Theory

Theorem (Perron-Frobenius):

Corollary 1:

Proposition:

Page 22: Maximium CMC

Stationary Distribution

Corollary:

Proof:

Convergence: Express

Page 23: Maximium CMC

Stationary Distribution

Page 24: Maximium CMC

Detailed Balance Conditions

Reversible Chains: A Markov chain determined by the transition matrixis reversible if there is a distribution that satisfies the detailed balanceconditions

Proof: We need to show that

Example:

Page 25: Maximium CMC

Markov Chain Monte Carlo Methods

Strategy: Markov chain simulation used when it is impossible, orcomputationally prohibitive, to sample directly from

Note:

• In Markov chain theory, we are given a Markov chain, P, and weconstruct its equilibrium distribution.

• In MCMC theory, we are “given” a distribution and we want to constructa Markov chain that is reversible with respect to it.

Page 26: Maximium CMC

Markov Chain Monte Carlo Methods

General Strategy:

Intuition: Recall that

Page 27: Maximium CMC

Markov Chain Monte Carlo MethodsIntuition:

Note: Narrower proposal distribution yields higher probability of acceptance.

Page 28: Maximium CMC

Metropolis Algorithm

Metropolis Algorithm: [Metropolis and Ulam, 1949]

Page 29: Maximium CMC

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm:

Examples:

Note: Considered one of top 10 algorithms of 20th century

Page 30: Maximium CMC

Proposal Distribution

Proposal Distribution: Significantly affects mixing

• Too wide: Too many points rejected and chain stays still for long periods;

• Too narrow: Acceptance ratio is high but algorithm is slow to exploreparameter space

• Ideally, it should have similar “shape” to posterior (target) distribution.

• Anisotropic posterior,isotropic proposal;

• Efficiency nonuniform fordifferent parameters

Problem:Result:

• Recovers efficiency ofunivariate case

Page 31: Maximium CMC

Proposal Distribution and Acceptance Probability

Proposal Distribution: Two basic approaches

• Choose a fixed proposal function

o Independent Metropolis

• Random walk (local Metropolis)

o Two (of several) choices:

Acceptance Probability:

Page 32: Maximium CMC

Random Walk Metropolis Algorithm for Parameter Estimation

Page 33: Maximium CMC

Random Walk Metropolis Algorithm for Parameter Estimation

Page 34: Maximium CMC

Markov Chain Monte Carlo: Example

Example: Consider the spring model

Page 35: Maximium CMC

Markov Chain Monte Carlo: ExampleExample: Single parameter c

Page 36: Maximium CMC

Markov Chain Monte Carlo: Example

Example: Consider the spring model

Case i:

Page 37: Maximium CMC

Markov Chain Monte Carlo: Example

Case i:

Note:

Page 38: Maximium CMC

Markov Chain Monte Carlo: Example

Case i:

Page 39: Maximium CMC

Markov Chain Monte Carlo: Example

Example: SMA-driven bending actuator -- talk with John Crews

Model: Estimated Parameters:

Page 40: Maximium CMC

Markov Chain Monte Carlo: Example

Example: SMA-driven bending actuator -- talk with John Crews

Page 41: Maximium CMC

Markov Chain Monte Carlo: Example

Example: SMA-driven bending actuator -- talk with John Crews

Page 42: Maximium CMC

Markov Chain Monte Carlo: Example

Example: SMA-driven bending actuator -- talk with John Crews

Page 43: Maximium CMC

Transition Kernel and Detailed Balance Condition

Transition Kernel: Recall that

Detailed Balance Condition:

Page 44: Maximium CMC

Transition Kernel and Detailed Balance Condition

Detailed Balance Condition: Here

Note:

Transition Kernel: Definition

Page 45: Maximium CMC

Sampling Error Variance

Strategy: Treat error variance as parameter to be estimated.

Recall: Assumption that errors

are normally distributed yields

Goal: Determine posterior distribution

Strategy:

• Choose prior so that posterior is from same family --- termed conjugate prior.

• For normal distribution with unknown variance, conjugate prior is inverseGamma distribution which is equivalent to inverse

Page 46: Maximium CMC

Definition:

Sampling Error Variance

Strategy:

Note:

Note:

Page 47: Maximium CMC

Sampling Error Variance

Example: Consider the spring model

Page 48: Maximium CMC

Related TopicsNote: This is an active research area and there are a number of related topics

• Burn in and convergence

• Adaptive algorithms

• Population Monte Carlo methods

• Sequential Monte Carlo methods and particle filters

• Gaussian mixture models

• Development of metamodels, surrogates and emulators to improveimplementation speeds

References:• A. Solonen, “Monte Carlo Methods in Parameter Estimation of Nonlinear Models,”Masters Thesis, 2006.

• H. Haario, E. Saksman, J. Tamminen, “An adaptive Metropolis algorithm,” Bernoulli,7(2), pp. 223-242, 2001.

• C. Andrieu and J. Thomas, “A tutorial on adaptive MCMC,” Statistics andComputing, 18, pp. 343-373, 2008.

• M. Vihola, “Robust adaptive Metropolis algorithm with coerced acceptance rate,”arXiv:1011.4381v2.