1 a(n) (extremely) brief/crude introduction to minimum description length principle jdu 2006-04

A(n) (extremely) brief/crude introduction to minimum description length princ

iplejdu

2006-04

Outline

• Conceptual/non-technical introduction

• Probabilities and Codelengths• Crude MDL• Refined MDL• Other topics

Outline

Introduction

• Example: data compression– Description methods

Source: Grnwald et al. (2005) Advances in Minimum Description Length: Theory and Applications.

Introduction

• Example: regression– Model selection and overfitting– Complexity of the model vs. Goodness of fit

Introduction

• Models vs. Hypotheses

Introduction

• Crude 2-part version of MDL

Outline

Probabilities and Codelengths• Let X be a finite or countable set

– A code C(x) for X• 1-to-1 mapping from X to Un>0{0,1}n

• LC(x): number of bits needed to encode x using C

– P: probability distribution defined on X• P(x): the probability of x• A sequence of (usually iid) observations x1, x2,

…, xn: xn

Probabilities and Codelengths• Prefix codes: as examples of uniquely

decodable codes– no code word is a prefix of any other

c 1011

d 1010

Source: http://www.cs.princeton.edu/courses/archive/spring04/cos126/

Probabilities and Codelengths• Expected codelength of a code C

– Lower bound:

• Optimal code– if it has minimum expected codelength over all un

iquely decodable codes– How to design one given P?

• Huffman coding

CCP xLxPxLE )()())((

xPxPxH )(log)()( 2

Probabilities and Codelengths• Huffman coding

Source: http://star.itc.it/caprile/teaching/algebra-superiore-2001/

Probabilities and Codelengths• How to design code for {1, 2, …, M}?

– Assuming a uniform distribution: 1/M for each number

– ~logM bits

Probabilities and Codelengths• How to design code for all the

positive integers?– For each k

• Describe it with 0s • Followed by a 1• Then encode k using the uniform code for• In total, ~ 2logk + 1 bits

– Can be refined…

Probabilities and Codelengths• Let P be a probability distribution over X,

then there exists a code C for X such that:

• Let C be a uniquely decodable code over X, then there exists a probability distribution P such that:

)(log)( xPxLC

)(log)( nnC xPxL

Probabilities and Codelengths• Codelength revisited

Outline

Crude MDL

• Preliminary: k-th order Markov chain on X={0,1}– A sequence: X1, X2, …, XN

– Special case: 0-th order: Bernoulli model (biased coin)

• Maximum Likelihood estimator

Crude MDL

• Preliminary: k-th order Markov chain on X={0,1}– Special case: first order Markov chain B(1)

• MLE

Crude MDL

• Preliminary: k-th order Markov chain on X={0,1}– 2k parameters

– Log likelihood function: …– MLE: …

Crude MDL

• Question: Given data D=xn, find the Markov chain that best explains D.– We do not want to restrict ourselves to cha

ins of fixed order• How to avoid overfitting?• Obviously, an (n-1)-th order Markov model wo

uld always fit the data the best

Crude MDL

• two-part MDL revisited

Crude MDL

• Description length of data given hypothesis

Crude MDL

• Description length of hypothesis– The code should not change with the

sample size n.– Different codes will lead to preferences

of different hypotheses– How to design a code that

• Leads to good inferences with small, practically relevant sample sizes?

Crude MDL

• An ``intuitive” and ``reasonable” code for k-th order Markov chain– First describe k using 2logk+1 bits– Then describe the d=2k parameters

• Assume n is given in advance– For each theta in the MLE {theta[1|000…000], …, theta[1|111

…111]}, the best precision we can achieve by counting is 1/(n+1)

– Describe each theta with log(n+1) bits– L(H)=2logk+1+dlog(n+1)– L(H)+L(D|H) = 2logk+1+dlog(n+1) – logP(D|k, theta)– For a given k, only the MLE theta need to be consi

Crude MDL

• Good news– We have found a principled manner to

encode data D using H

• Bad news– We have not found clear guidelines to

design codes for H

Outline

• Probabilities and Codelengths• Crude MDL• Refined MDL• Other issues

Refined MDL

• Universal codes and universal distributions– maximum likelihood code depends on the

data• How to describe the data in an unambiguous

manner?– Design a code such that for every possible

observation, its codelength corresponds to its ML? - impossible

Refined MDL

• Worst-case regret

• Optimal universal model

Refined MDL

• Normalized maximum likelihood (NML)

• Minimizing -logNML

Refined MDL

• Complexity of a model

– The more sequences that can be fit well by an element of M, the larger M’s complexity

– Would it lead to a ``right” balance between complexity and fit?• Hopefully…

Refined MDL

• General refined MDL

Outline

1 a(n) (extremely) brief/crude introduction to minimum description length principle jdu 2006-04

th order markov chain

codelengthslet x

th order markov model

code clower

probability distribution

minimum expected codelength

decodable codesno code

uniform code forin total

Documents

extremely robust extremely compact extremely flexible

detroit in perspective - uta · detroit in perspective were...

crude oil (sour) -...

crude distillation in crude oil refinery - alfa laval

crude stablization

india crude tanker report indian crude tanker indian...

jdu-rjd-congress 243 candidates list for bihar assembly...

doug hayes (pro hac vice sierra club environmental law...

· 2016-10-26 · pitch creosote pitch creosote pitch...

sustainable roohsing&jdu · 2019-08-12 · 第六期 june.1...

beronet telephony appliance 2 -...

btc crude topping units tech description 22 apr - salvex...

crude & prodcuts

crude oilcrude oilcrude oilcrude...

nikkei, huáqiáo, zainichi enseñar una geografía de las...

extremely fast. extremely precise. extremely small...

what is crude oil? crude oil -.. what is crude oil? crude...

carbon intensity of crude oil in europe crude

on-line analyses of crude feeds and distillation products...

cessna c177rg ha-jdu - cavok aviation training · cessna...