maximium cmc

Bayesian Techniques for Parameter Estimation

“He has Van Gogh’s ear for music,” Billy Wilder

Statistical Inference

Goal: The goal in statistical inference is to make conclusions about aphenomenon based on observed data.

Frequentist: Observations made in the past are analyzed with a specifiedmodel. Result is regarded as confidence about state of real world.

• Probabilities defined as frequencies with which an event occurs ifexperiment is repeated several times.

• Parameter Estimation:

o Relies on estimators derived from different data sets and a specificsampling distribution.

o Parameters may be unknown but are fixed

Bayesian: Interpretation of probability is subjective and can be updated withnew data.

• Parameter Estimation: Parameters described as density

Bayesian Inference

Framework:

• Prior Distribution: Quantifies prior knowledge of parameter values.

• Likelihood: Probability of observing a data if we have a certain set ofparameter values.

• Posterior Distribution: Conditional probability distribution of unknownparameters given observed data.

Joint PDF: Quantifies all combination of data and observations

Bayes’ Relation: Specifies posterior in terms of likelihood, prior, andnormalization constant

Problem: Evaluation of normalization constant typically requires highdimensional integration.

Bayesian Inference

Uninformative Prior: No a priori information parameters

Informative Prior: Use conjugate priors; prior and posterior from samedistribution

Evaluation Strategies:

• Analytic integration --- Rare

• Classical quadrature; e.g., p = 2

• Monte Carlo quadrature Techniques

• Markov Chains

Bayesian Inference

Example:

Bayesian Inference

Example:

1 Head, 0 Tails 5 Heads, 9 Tails 49 Heads, 51 Tails

Note:

Bayesian Inference

Example: Now consider

Note: Poor informative prior incorrectly influences results for a long time.

50 Heads, 50 Tails5 Heads, 5 Tails

Likelihood:

Assumption:

Note:

Parameter Estimation Problem

Parameter Estimation: Example

Example: Consider the spring model


Ordinary Least Squares: Here

Sample Distribution


Bayesian Inference: The likelihood is

Strategy: Create Markov chain usingrandom sampling so that created chainhas the posterior distribution as itslimiting (stationary) distribution.

Posterior Distribution:

Markov Chains

Definition:

Note: A Markov chain is characterized by three components: a state space, aninitial distribution, and a transition kernel.

State Space:

Initial Distribution: (Mass)

Transition Probability: (Markov Kernel)

Markov Chains

Example:

Chapman-Kolmogorov Equations:

Markov Chains: Limiting Distribution

Example: Raleigh weather -- Tomorrow’s weather conditioned on today’sweather

rain sun

Question:

Definition: This is the limiting distribution (invariant measure)

Markov Chains: Limiting Distribution

Example: Raleigh weather

rain sunSolve

Irreducible Markov Chains

Irreducible:

Reducible Markov Chain:

p1 p2

Note: Limiting distribution notunique if chain is reducible.

Periodic Markov Chains

Example:

Periodicity: A Markov chain is periodic if parts of the state space are visited atregular intervals. The period k is defined as

Periodic Markov Chains

Example:

Stationary Distribution

Theorem: A finite, homogeneous Markov chain that is irreducible and aperiodichas a unique stationary distribution and the .chain will converge in the sense ofdistributions from any initial distribution .

Recurrence (Persistence):

Example: State 3 is transient

Ergodicity: A state is termed ergodic if it is aperiodic and recurrent. If all statesof an irreducible Markov chain are ergodic, the chain is said to be ergodic.

Matrix Theory

Definition:

Lemma:

Example:

Matrix Theory

Theorem (Perron-Frobenius):

Corollary 1:

Proposition:


Corollary:

Proof:

Convergence: Express

Detailed Balance Conditions

Reversible Chains: A Markov chain determined by the transition matrixis reversible if there is a distribution that satisfies the detailed balanceconditions

Proof: We need to show that

Example:

Markov Chain Monte Carlo Methods

Strategy: Markov chain simulation used when it is impossible, orcomputationally prohibitive, to sample directly from

Note:

• In Markov chain theory, we are given a Markov chain, P, and weconstruct its equilibrium distribution.

• In MCMC theory, we are “given” a distribution and we want to constructa Markov chain that is reversible with respect to it.

Markov Chain Monte Carlo Methods

General Strategy:

Intuition: Recall that

Markov Chain Monte Carlo MethodsIntuition:

Note: Narrower proposal distribution yields higher probability of acceptance.

Metropolis Algorithm

Metropolis Algorithm: [Metropolis and Ulam, 1949]

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm:

Examples:

Note: Considered one of top 10 algorithms of 20th century

Proposal Distribution

Proposal Distribution: Significantly affects mixing

• Too wide: Too many points rejected and chain stays still for long periods;

• Too narrow: Acceptance ratio is high but algorithm is slow to exploreparameter space

• Ideally, it should have similar “shape” to posterior (target) distribution.

• Anisotropic posterior,isotropic proposal;

• Efficiency nonuniform fordifferent parameters

Problem:Result:

• Recovers efficiency ofunivariate case

Proposal Distribution and Acceptance Probability

Proposal Distribution: Two basic approaches

• Choose a fixed proposal function

o Independent Metropolis

• Random walk (local Metropolis)

o Two (of several) choices:

Acceptance Probability:

Random Walk Metropolis Algorithm for Parameter Estimation

Markov Chain Monte Carlo: Example


Markov Chain Monte Carlo: ExampleExample: Single parameter c



Case i:


Case i:

Note:


Case i:


Example: SMA-driven bending actuator -- talk with John Crews

Model: Estimated Parameters:


Example: SMA-driven bending actuator -- talk with John Crews

Transition Kernel and Detailed Balance Condition

Transition Kernel: Recall that

Detailed Balance Condition:

Transition Kernel and Detailed Balance Condition

Detailed Balance Condition: Here

Note:

Transition Kernel: Definition

Sampling Error Variance

Strategy: Treat error variance as parameter to be estimated.

Recall: Assumption that errors

are normally distributed yields

Goal: Determine posterior distribution

Strategy:

• Choose prior so that posterior is from same family --- termed conjugate prior.

• For normal distribution with unknown variance, conjugate prior is inverseGamma distribution which is equivalent to inverse

Definition:


Strategy:

Note:

Note:

Related TopicsNote: This is an active research area and there are a number of related topics

• Burn in and convergence

• Adaptive algorithms

• Population Monte Carlo methods

• Sequential Monte Carlo methods and particle filters

• Gaussian mixture models

• Development of metamodels, surrogates and emulators to improveimplementation speeds

References:• A. Solonen, “Monte Carlo Methods in Parameter Estimation of Nonlinear Models,”Masters Thesis, 2006.

• H. Haario, E. Saksman, J. Tamminen, “An adaptive Metropolis algorithm,” Bernoulli,7(2), pp. 223-242, 2001.

• C. Andrieu and J. Thomas, “A tutorial on adaptive MCMC,” Statistics andComputing, 18, pp. 343-373, 2008.

• M. Vihola, “Robust adaptive Metropolis algorithm with coerced acceptance rate,”arXiv:1011.4381v2.

maximium cmc

Documents

prior distribution

posterior distribution

markov chainsdefinition

homogeneous markov chain

aninitial distribution

specificsampling distribution

distribution notunique

observed data