markovian (and conceivably causal) representations of ...cshalizi/talks/2010-07-10-uai.pdf ·...

135
Markovian (and conceivably causal) representations of stochastic processes Cosma Shalizi Statistics Dept., Carnegie Mellon University & Santa Fe Institute 10 July 2010 UAI

Upload: others

Post on 12-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Markovian (and conceivably causal)representations of stochastic processes

Cosma Shalizi

Statistics Dept., Carnegie Mellon University & Santa Fe Institute

10 July 2010UAI

Page 2: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

“State”

In classical physics or dynamical systems, “state” is the presentvariable which fixes all future observablesBack away from determinism: state determines distribution offuture observablesWould like a small stateWould like the state to be well-behaved, e.g. homogeneousMarkovTry construct states by constructing predictions

Cosma Shalizi Markovian Representations

Page 3: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

Notation etc.

Upper-case letters are random variables, lower-case theirrealizationsStochastic process . . . ,X−1,X0,X1,X2, . . .X t

s = (Xs,Xs+1, . . .Xt−1,Xt )Past up to and including t is X t

−∞, future is X∞t+1Discrete time is not required but cuts down on measure theory

Cosma Shalizi Markovian Representations

Page 4: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

Making a Prediction

Look at X t−∞, make a guess about X∞t+1

Most general guess is a probability distributionOnly ever attend to selected aspects of X t

−∞mean, variance, phase of 1st three Fourier modes, . . .

∴ guess is a function or statistic of X t−∞

What’s a good statistic to use?

Cosma Shalizi Markovian Representations

Page 5: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

Predictive Sufficiency

For any statistic σ,

I[X∞t+1; X t−∞] ≥ I[X∞t+1;σ(X t

−∞)]

σ is predictively sufficient iff

I[X∞t+1; X t−∞] = I[X∞t+1;σ(X t

−∞)]

Sufficient statistics retain all predictive information in the data

Cosma Shalizi Markovian Representations

Page 6: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

Why Care About Sufficiency?

Optimal strategy, under any loss function, only needs asufficient statistic (Blackwell & Girshick)Strategies using insufficient statistics can generally beimproved (Blackwell & Rao)Excuse for not worrying about particular loss functions

Cosma Shalizi Markovian Representations

Page 7: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

Why Care About Sufficiency?

Optimal strategy, under any loss function, only needs asufficient statistic (Blackwell & Girshick)

Strategies using insufficient statistics can generally beimproved (Blackwell & Rao)Excuse for not worrying about particular loss functions

Cosma Shalizi Markovian Representations

Page 8: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

Why Care About Sufficiency?

Optimal strategy, under any loss function, only needs asufficient statistic (Blackwell & Girshick)Strategies using insufficient statistics can generally beimproved (Blackwell & Rao)

Excuse for not worrying about particular loss functions

Cosma Shalizi Markovian Representations

Page 9: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

Why Care About Sufficiency?

Optimal strategy, under any loss function, only needs asufficient statistic (Blackwell & Girshick)Strategies using insufficient statistics can generally beimproved (Blackwell & Rao)Excuse for not worrying about particular loss functions

Cosma Shalizi Markovian Representations

Page 10: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

“Causal” States

(Crutchfield and Young, 1989)

Histories a and b are equivalent iff

Pr(X∞t+1|X t

−∞ = a)

= Pr(X∞t+1|X t

−∞ = b)

[a] ≡ all histories equivalent to aThe statistic of interest, the causal state, is

ε(x t−∞) = [x t

−∞]

Set st = ε(x t−1−∞)

A state is an equivalence class of histories and a distributionover future eventsIID = 1 state, periodic = p states

Cosma Shalizi Markovian Representations

Page 11: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

“Causal” States

(Crutchfield and Young, 1989)

Histories a and b are equivalent iff

Pr(X∞t+1|X t

−∞ = a)

= Pr(X∞t+1|X t

−∞ = b)

[a] ≡ all histories equivalent to a

The statistic of interest, the causal state, is

ε(x t−∞) = [x t

−∞]

Set st = ε(x t−1−∞)

A state is an equivalence class of histories and a distributionover future eventsIID = 1 state, periodic = p states

Cosma Shalizi Markovian Representations

Page 12: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

“Causal” States

(Crutchfield and Young, 1989)

Histories a and b are equivalent iff

Pr(X∞t+1|X t

−∞ = a)

= Pr(X∞t+1|X t

−∞ = b)

[a] ≡ all histories equivalent to aThe statistic of interest, the causal state, is

ε(x t−∞) = [x t

−∞]

Set st = ε(x t−1−∞)

A state is an equivalence class of histories and a distributionover future eventsIID = 1 state, periodic = p states

Cosma Shalizi Markovian Representations

Page 13: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

“Causal” States

(Crutchfield and Young, 1989)

Histories a and b are equivalent iff

Pr(X∞t+1|X t

−∞ = a)

= Pr(X∞t+1|X t

−∞ = b)

[a] ≡ all histories equivalent to aThe statistic of interest, the causal state, is

ε(x t−∞) = [x t

−∞]

Set st = ε(x t−1−∞)

A state is an equivalence class of histories and a distributionover future events

IID = 1 state, periodic = p states

Cosma Shalizi Markovian Representations

Page 14: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

“Causal” States

(Crutchfield and Young, 1989)

Histories a and b are equivalent iff

Pr(X∞t+1|X t

−∞ = a)

= Pr(X∞t+1|X t

−∞ = b)

[a] ≡ all histories equivalent to aThe statistic of interest, the causal state, is

ε(x t−∞) = [x t

−∞]

Set st = ε(x t−1−∞)

A state is an equivalence class of histories and a distributionover future eventsIID = 1 state, periodic = p states

Cosma Shalizi Markovian Representations

Page 15: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

set of histories, color-coded by conditional distribution of futures

Cosma Shalizi Markovian Representations

Page 16: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Notation and setting

Partitioning histories into causal states

Cosma Shalizi Markovian Representations

Page 17: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Sufficiency

(Shalizi and Crutchfield, 2001)

I[X∞t+1; X t−∞] = I[X∞t+1; ε(X t

−∞)]

because

Pr(X∞t+1|St = ε(x t

−∞))

=

∫y∈[x t

−∞]Pr(X∞t+1|X t

−∞ = y)

Pr(X t−∞ = y |St = ε(x t

−∞))

dy

= Pr(X∞t+1|X t

−∞ = x t−∞)

Cosma Shalizi Markovian Representations

Page 18: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Sufficiency

(Shalizi and Crutchfield, 2001)

I[X∞t+1; X t−∞] = I[X∞t+1; ε(X t

−∞)]

because

Pr(X∞t+1|St = ε(x t

−∞))

=

∫y∈[x t

−∞]Pr(X∞t+1|X t

−∞ = y)

Pr(X t−∞ = y |St = ε(x t

−∞))

dy

= Pr(X∞t+1|X t

−∞ = x t−∞)

Cosma Shalizi Markovian Representations

Page 19: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

A non-sufficient partition of histories

Cosma Shalizi Markovian Representations

Page 20: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Effect of insufficiency on predictive distributions

Cosma Shalizi Markovian Representations

Page 21: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Markov Properties

Future observations are independent of the past given thecausal state:

X∞t+1 |= X t−∞|St+1

by sufficiency:

Pr(X∞t+1|X t

−∞ = x t−∞,St+1 = ε(x t

−∞))

= Pr(X∞t+1|X t

−∞ = x t−∞)

= Pr(X∞t+1|St+1 = ε(x t

−∞))

Cosma Shalizi Markovian Representations

Page 22: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Markov Properties

Future observations are independent of the past given thecausal state:

X∞t+1 |= X t−∞|St+1

by sufficiency:

Pr(X∞t+1|X t

−∞ = x t−∞,St+1 = ε(x t

−∞))

= Pr(X∞t+1|X t

−∞ = x t−∞)

= Pr(X∞t+1|St+1 = ε(x t

−∞))

Cosma Shalizi Markovian Representations

Page 23: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Recursive Updating/Deterministic Transitions

Recursive transitions for states:

ε(x t+1−∞) = T (ε(x t

−∞), xt+1)

Automata theory: “deterministic transitions” (even though thereare probabilities)In continuous time:

ε(x t+h−∞) = T (ε(x t

−∞), x t+ht )

Cosma Shalizi Markovian Representations

Page 24: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

If u ∼ v , any future event F , and single observation a

Pr(X∞t+1 ∈ aF |X t

−∞ = u)

= Pr(X∞t+1 ∈ aF |X t

−∞ = v)

Pr(Xt+1 = a,X∞t+2 ∈ F |X t

−∞ = u)

= Pr(Xt+1 = a,X∞t+2 ∈ F |X t

−∞ = v)

Pr(

X∞t+2 ∈ F |X t+1−∞ = ua

)Pr(Xt+1 = a|X t

−∞ = u)

= Pr(

X∞t+2 ∈ F |X t+1−∞ = va

)Pr(Xt+1 = a|X t

−∞ = v)

Pr(

X∞t+2 ∈ F |X t+1−∞ = ua

)= Pr

(X∞t+2 ∈ F |X t+1

−∞ = va)

ua ∼ va

(same for continuous values or time but need more measure theory)

Cosma Shalizi Markovian Representations

Page 25: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

If u ∼ v , any future event F , and single observation a

Pr(X∞t+1 ∈ aF |X t

−∞ = u)

= Pr(X∞t+1 ∈ aF |X t

−∞ = v)

Pr(Xt+1 = a,X∞t+2 ∈ F |X t

−∞ = u)

= Pr(Xt+1 = a,X∞t+2 ∈ F |X t

−∞ = v)

Pr(

X∞t+2 ∈ F |X t+1−∞ = ua

)Pr(Xt+1 = a|X t

−∞ = u)

= Pr(

X∞t+2 ∈ F |X t+1−∞ = va

)Pr(Xt+1 = a|X t

−∞ = v)

Pr(

X∞t+2 ∈ F |X t+1−∞ = ua

)= Pr

(X∞t+2 ∈ F |X t+1

−∞ = va)

ua ∼ va

(same for continuous values or time but need more measure theory)

Cosma Shalizi Markovian Representations

Page 26: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Causal States are Markovian

S∞t+1 |= St−1−∞|St

because

S∞t+1 = T (St ,X∞t )

andX∞t |=

{X t−1−∞,S

t−1−∞

}|St

Also, the transitions are homogeneous

Cosma Shalizi Markovian Representations

Page 27: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Causal States are Markovian

S∞t+1 |= St−1−∞|St

because

S∞t+1 = T (St ,X∞t )

andX∞t |=

{X t−1−∞,S

t−1−∞

}|St

Also, the transitions are homogeneous

Cosma Shalizi Markovian Representations

Page 28: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Causal States are Markovian

S∞t+1 |= St−1−∞|St

because

S∞t+1 = T (St ,X∞t )

andX∞t |=

{X t−1−∞,S

t−1−∞

}|St

Also, the transitions are homogeneous

Cosma Shalizi Markovian Representations

Page 29: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Causal States are Markovian

S∞t+1 |= St−1−∞|St

because

S∞t+1 = T (St ,X∞t )

andX∞t |=

{X t−1−∞,S

t−1−∞

}|St

Also, the transitions are homogeneous

Cosma Shalizi Markovian Representations

Page 30: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimality

ε is minimal sufficient= can be computed from any other sufficient statistic

= for any sufficient η, exists a function g such that

ε(X t−∞) = g(η(X t

−∞))

Therefore, if η is sufficient

I[ε(X t−∞); X t

−∞] ≤ I[η(X t−∞); X t

−∞]

Cosma Shalizi Markovian Representations

Page 31: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimality

ε is minimal sufficient= can be computed from any other sufficient statistic= for any sufficient η, exists a function g such that

ε(X t−∞) = g(η(X t

−∞))

Therefore, if η is sufficient

I[ε(X t−∞); X t

−∞] ≤ I[η(X t−∞); X t

−∞]

Cosma Shalizi Markovian Representations

Page 32: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimality

ε is minimal sufficient= can be computed from any other sufficient statistic= for any sufficient η, exists a function g such that

ε(X t−∞) = g(η(X t

−∞))

Therefore, if η is sufficient

I[ε(X t−∞); X t

−∞] ≤ I[η(X t−∞); X t

−∞]

Cosma Shalizi Markovian Representations

Page 33: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Sufficient, but not minimal, partition of histories

Cosma Shalizi Markovian Representations

Page 34: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Coarser than the causal states, but not sufficient

Cosma Shalizi Markovian Representations

Page 35: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Uniqueness

There is really no other minimal sufficient statistic

If η is minimal, there is an h such that

η = h(ε) a.s.

but ε = g(η) (a.s.) so

g(h(ε)) = ε

h(g(η)) = η

ε and η partition histories in the same way (a.s.)

Cosma Shalizi Markovian Representations

Page 36: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Uniqueness

There is really no other minimal sufficient statisticIf η is minimal, there is an h such that

η = h(ε) a.s.

but ε = g(η) (a.s.) so

g(h(ε)) = ε

h(g(η)) = η

ε and η partition histories in the same way (a.s.)

Cosma Shalizi Markovian Representations

Page 37: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Uniqueness

There is really no other minimal sufficient statisticIf η is minimal, there is an h such that

η = h(ε) a.s.

but ε = g(η) (a.s.)

so

g(h(ε)) = ε

h(g(η)) = η

ε and η partition histories in the same way (a.s.)

Cosma Shalizi Markovian Representations

Page 38: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Uniqueness

There is really no other minimal sufficient statisticIf η is minimal, there is an h such that

η = h(ε) a.s.

but ε = g(η) (a.s.) so

g(h(ε)) = ε

h(g(η)) = η

ε and η partition histories in the same way (a.s.)

Cosma Shalizi Markovian Representations

Page 39: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimal stochasticity

If Rt = η(X t−1−∞) is also sufficient, then

H[Rt+1|Rt ] ≥ H[St+1|St ]

∴ the predictive states are the closest we get to a deterministicmodel, without losing power

Cosma Shalizi Markovian Representations

Page 40: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimal stochasticity

If Rt = η(X t−1−∞) is also sufficient, then

H[Rt+1|Rt ] ≥ H[St+1|St ]

∴ the predictive states are the closest we get to a deterministicmodel, without losing power

Cosma Shalizi Markovian Representations

Page 41: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Entropy Rate

h1 ≡ limn→∞

H[Xn|X n−11 ] = lim

n→∞H[Xn|Sn]

= H[X1|S1]

so the predictive states lets us calculate the entropy rateand do source coding

Cosma Shalizi Markovian Representations

Page 42: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Entropy Rate

h1 ≡ limn→∞

H[Xn|X n−11 ] = lim

n→∞H[Xn|Sn]

= H[X1|S1]

so the predictive states lets us calculate the entropy rate

and do source coding

Cosma Shalizi Markovian Representations

Page 43: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Entropy Rate

h1 ≡ limn→∞

H[Xn|X n−11 ] = lim

n→∞H[Xn|Sn]

= H[X1|S1]

so the predictive states lets us calculate the entropy rateand do source coding

Cosma Shalizi Markovian Representations

Page 44: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimal Markovian Representation

The observed process (Xt ) is non-Markovian and ugly

But it is generated from a homogeneous Markov process (St )After minimization, this representation is (essentially) uniqueCan exist small Markovian representations, but then alwayshave distributions over those states. . .. . . and those distributions correspond to predictive states

Cosma Shalizi Markovian Representations

Page 45: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimal Markovian Representation

The observed process (Xt ) is non-Markovian and uglyBut it is generated from a homogeneous Markov process (St )

After minimization, this representation is (essentially) uniqueCan exist small Markovian representations, but then alwayshave distributions over those states. . .. . . and those distributions correspond to predictive states

Cosma Shalizi Markovian Representations

Page 46: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimal Markovian Representation

The observed process (Xt ) is non-Markovian and uglyBut it is generated from a homogeneous Markov process (St )After minimization, this representation is (essentially) unique

Can exist small Markovian representations, but then alwayshave distributions over those states. . .. . . and those distributions correspond to predictive states

Cosma Shalizi Markovian Representations

Page 47: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimal Markovian Representation

The observed process (Xt ) is non-Markovian and uglyBut it is generated from a homogeneous Markov process (St )After minimization, this representation is (essentially) uniqueCan exist small Markovian representations, but then alwayshave distributions over those states. . .

. . . and those distributions correspond to predictive states

Cosma Shalizi Markovian Representations

Page 48: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Minimal Markovian Representation

The observed process (Xt ) is non-Markovian and uglyBut it is generated from a homogeneous Markov process (St )After minimization, this representation is (essentially) uniqueCan exist small Markovian representations, but then alwayshave distributions over those states. . .. . . and those distributions correspond to predictive states

Cosma Shalizi Markovian Representations

Page 49: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

What Sort of Markov Model?

Common-or-garden HMM:

St+1 |= Xt |St

But hereSt+1 = T (St ,Xt )

This is a chain with complete connections (Onicescu andMihoc, 1935; Iosifescu and Grigorescu, 1990)

Cosma Shalizi Markovian Representations

Page 50: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

What Sort of Markov Model?

Common-or-garden HMM:

St+1 |= Xt |St

But hereSt+1 = T (St ,Xt )

This is a chain with complete connections (Onicescu andMihoc, 1935; Iosifescu and Grigorescu, 1990)

Cosma Shalizi Markovian Representations

Page 51: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

What Sort of Markov Model?

Common-or-garden HMM:

St+1 |= Xt |St

But hereSt+1 = T (St ,Xt )

This is a chain with complete connections (Onicescu andMihoc, 1935; Iosifescu and Grigorescu, 1990)

Cosma Shalizi Markovian Representations

Page 52: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

HMM

S1

X1

S2

X2

S3

X3

CCC

S1

X1

S2

X2

S3

X3

Cosma Shalizi Markovian Representations

Page 53: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

HMM

S1

X1

S2

X2

S3

X3

CCC

S1

X1

S2

X2

S3

X3

Cosma Shalizi Markovian Representations

Page 54: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Example of a CCC: Even Process

2 1B | 1.0B | 0.5

A | 0.5

Blocks of As of any length, separated by even-length blocks ofBs

Not Markov at any order

Cosma Shalizi Markovian Representations

Page 55: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Example of a CCC: Even Process

2 1B | 1.0B | 0.5

A | 0.5

Blocks of As of any length, separated by even-length blocks ofBs

Not Markov at any order

Cosma Shalizi Markovian Representations

Page 56: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Inventions

Statistical relevance basis (Salmon, 1971, 1984)

Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)

Cosma Shalizi Markovian Representations

Page 57: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Inventions

Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)

Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)

Cosma Shalizi Markovian Representations

Page 58: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Inventions

Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)

Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)

Cosma Shalizi Markovian Representations

Page 59: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Inventions

Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)

Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)

Cosma Shalizi Markovian Representations

Page 60: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Inventions

Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)

Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)

Cosma Shalizi Markovian Representations

Page 61: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Inventions

Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)

Sufficient posterior representation (Langford et al., 2009)

Cosma Shalizi Markovian Representations

Page 62: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Inventions

Statistical relevance basis (Salmon, 1971, 1984)Measure-theoretic prediction process (Knight, 1975, 1992)Forecasting/true measure complexity (Grassberger, 1986)Causal states, ε machine (Crutchfield and Young, 1989)Observable operator model (Jaeger, 2000)Predictive state representations (Littman et al., 2002)Sufficient posterior representation (Langford et al., 2009)

Cosma Shalizi Markovian Representations

Page 63: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

How Broad Are These Results?

Knight (1975, 1992) gave most general constructionsNon-stationary Xt continuous (but discrete works as special case)Xt with values in a Lusin space

(= image of a complete separable

metrizable space under a measurable bijection)

St is a homogeneous strong Markov process withdeterministic updatingSt has cadlag sample paths (in appropriate topologies oninfinite-dimensional distributions)

Cosma Shalizi Markovian Representations

Page 64: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

How Broad Are These Results?

Knight (1975, 1992) gave most general constructionsNon-stationary Xt continuous (but discrete works as special case)Xt with values in a Lusin space (= image of a complete separable

metrizable space under a measurable bijection)

St is a homogeneous strong Markov process withdeterministic updatingSt has cadlag sample paths (in appropriate topologies oninfinite-dimensional distributions)

Cosma Shalizi Markovian Representations

Page 65: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

How Broad Are These Results?

Knight (1975, 1992) gave most general constructionsNon-stationary Xt continuous (but discrete works as special case)Xt with values in a Lusin space (= image of a complete separable

metrizable space under a measurable bijection)

St is a homogeneous strong Markov process withdeterministic updatingSt has cadlag sample paths (in appropriate topologies oninfinite-dimensional distributions)

Cosma Shalizi Markovian Representations

Page 66: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

A Cousin: The Information Bottleneck

(Tishby et al., 1999)

For inputs X and outputs Y , fix β > 0, find η(X ), the bottleneckvariable, maximizing

I[η(X ); Y ]− βI[η(X ); X ]

give up 1 bit of predictive information for β bits of memoryPredictive sufficiency comes as β →∞, unwilling to lose anypredictive power

Cosma Shalizi Markovian Representations

Page 67: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

A Cousin: The Information Bottleneck

(Tishby et al., 1999)

For inputs X and outputs Y , fix β > 0, find η(X ), the bottleneckvariable, maximizing

I[η(X ); Y ]− βI[η(X ); X ]

give up 1 bit of predictive information for β bits of memory

Predictive sufficiency comes as β →∞, unwilling to lose anypredictive power

Cosma Shalizi Markovian Representations

Page 68: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

A Cousin: The Information Bottleneck

(Tishby et al., 1999)

For inputs X and outputs Y , fix β > 0, find η(X ), the bottleneckvariable, maximizing

I[η(X ); Y ]− βI[η(X ); X ]

give up 1 bit of predictive information for β bits of memoryPredictive sufficiency comes as β →∞, unwilling to lose anypredictive power

Cosma Shalizi Markovian Representations

Page 69: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Extension 1: Input-Output

(Littman et al., 2002; Shalizi, 2001, ch. 7)

System output (Xt ), input (Yt )Histories x t

−∞, y t−∞ have distributions of output xt+1 for each

further input yt+1Equivalence class these distributions and enforce recursiveupdatingInternal states of the system, not trying to predict future inputs

Cosma Shalizi Markovian Representations

Page 70: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Extension 2: Space and Time

(Shalizi, 2003; Shalizi et al., 2004, 2006; Jänicke et al., 2007)

Dynamic random field X (~r , t)Past cone: points in space-time which could matter to X (~r , t)Future cone: points in space-time for which X (~r , t) could matter

past

future

Equivalence-class past cone configurations byconditional distributions over future conesS(~r , t) is a Markov fieldMinimal sufficiency, recursive updating, etc., allgo through

Cosma Shalizi Markovian Representations

Page 71: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Extension 2: Space and Time

(Shalizi, 2003; Shalizi et al., 2004, 2006; Jänicke et al., 2007)

Dynamic random field X (~r , t)Past cone: points in space-time which could matter to X (~r , t)Future cone: points in space-time for which X (~r , t) could matter

past

future

Equivalence-class past cone configurations byconditional distributions over future conesS(~r , t) is a Markov fieldMinimal sufficiency, recursive updating, etc., allgo through

Cosma Shalizi Markovian Representations

Page 72: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Extension 2: Space and Time

(Shalizi, 2003; Shalizi et al., 2004, 2006; Jänicke et al., 2007)

Dynamic random field X (~r , t)Past cone: points in space-time which could matter to X (~r , t)Future cone: points in space-time for which X (~r , t) could matter

past

future

Equivalence-class past cone configurations byconditional distributions over future conesS(~r , t) is a Markov fieldMinimal sufficiency, recursive updating, etc., allgo through

Cosma Shalizi Markovian Representations

Page 73: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Statistical Complexity

Definition (Grassberger, 1986; Crutchfield and Young, 1989)

C ≡ I[ε(X t−∞); X t

−∞] is the statistical forecasting complexityof the process

= amount of information about the past needed for optimalprediction= H[ε(X t

−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem

Cosma Shalizi Markovian Representations

Page 74: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Statistical Complexity

Definition (Grassberger, 1986; Crutchfield and Young, 1989)

C ≡ I[ε(X t−∞); X t

−∞] is the statistical forecasting complexityof the process

= amount of information about the past needed for optimalprediction

= H[ε(X t−∞)] for discrete causal states

= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem

Cosma Shalizi Markovian Representations

Page 75: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Statistical Complexity

Definition (Grassberger, 1986; Crutchfield and Young, 1989)

C ≡ I[ε(X t−∞); X t

−∞] is the statistical forecasting complexityof the process

= amount of information about the past needed for optimalprediction= H[ε(X t

−∞)] for discrete causal states

= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem

Cosma Shalizi Markovian Representations

Page 76: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Statistical Complexity

Definition (Grassberger, 1986; Crutchfield and Young, 1989)

C ≡ I[ε(X t−∞); X t

−∞] is the statistical forecasting complexityof the process

= amount of information about the past needed for optimalprediction= H[ε(X t

−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)

= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem

Cosma Shalizi Markovian Representations

Page 77: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Statistical Complexity

Definition (Grassberger, 1986; Crutchfield and Young, 1989)

C ≡ I[ε(X t−∞); X t

−∞] is the statistical forecasting complexityof the process

= amount of information about the past needed for optimalprediction= H[ε(X t

−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes

= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem

Cosma Shalizi Markovian Representations

Page 78: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Statistical Complexity

Definition (Grassberger, 1986; Crutchfield and Young, 1989)

C ≡ I[ε(X t−∞); X t

−∞] is the statistical forecasting complexityof the process

= amount of information about the past needed for optimalprediction= H[ε(X t

−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocesses

Property of the process, not learning problem

Cosma Shalizi Markovian Representations

Page 79: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

SufficiencyMarkov PropertiesMinimalities

Statistical Complexity

Definition (Grassberger, 1986; Crutchfield and Young, 1989)

C ≡ I[ε(X t−∞); X t

−∞] is the statistical forecasting complexityof the process

= amount of information about the past needed for optimalprediction= H[ε(X t

−∞)] for discrete causal states= expected algorithmic sophistication (Gács et al., 2001)= log(period) for period processes= log(geometric mean(recurrence time)) for stationaryprocessesProperty of the process, not learning problem

Cosma Shalizi Markovian Representations

Page 80: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Connecting to Data

Everything so far has been math/probability

(The Oracle tells us the infinite-dimensional distribution of X )

Can we do some statistics and find the states?Two senses of “find”: learn in a fixed model vs. discover theright model

Cosma Shalizi Markovian Representations

Page 81: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Connecting to Data

Everything so far has been math/probability(The Oracle tells us the infinite-dimensional distribution of X )

Can we do some statistics and find the states?Two senses of “find”: learn in a fixed model vs. discover theright model

Cosma Shalizi Markovian Representations

Page 82: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Connecting to Data

Everything so far has been math/probability(The Oracle tells us the infinite-dimensional distribution of X )

Can we do some statistics and find the states?Two senses of “find”: learn in a fixed model vs. discover theright model

Cosma Shalizi Markovian Representations

Page 83: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Learning

Given states and transitions (ε,T ), realization xn1

Estimate Pr (Xt+1 = x |St = s)

Just estimation for stochastic processesEasier than ordinary HMMs because St is a function oftrajectoryExponential families in the all-discrete case, very tractable

Cosma Shalizi Markovian Representations

Page 84: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Learning

Given states and transitions (ε,T ), realization xn1

Estimate Pr (Xt+1 = x |St = s)

Just estimation for stochastic processesEasier than ordinary HMMs because St is a function oftrajectoryExponential families in the all-discrete case, very tractable

Cosma Shalizi Markovian Representations

Page 85: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Discovery

Given xn1

Estimate ε,T ,Pr (Xt+1 = x |St = s)

Inspiration: “geometry from a time series” in nonlineardynamicsConditional independence testing reminiscent of PCalgorithmFunction learning approach (Langford et al., 2009)Nobody seems to have tried non-parametric Bayes

Cosma Shalizi Markovian Representations

Page 86: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Discovery

Given xn1

Estimate ε,T ,Pr (Xt+1 = x |St = s)

Inspiration: “geometry from a time series” in nonlineardynamicsConditional independence testing reminiscent of PCalgorithmFunction learning approach (Langford et al., 2009)Nobody seems to have tried non-parametric Bayes

Cosma Shalizi Markovian Representations

Page 87: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

“Geometry from a Time Series”

Deterministic dynamical system with state zt on a smoothmanifold of dimension m, zt+1 = f (zt )Only identified up to a smooth, invertible change of coordinates(diffeomorphism)Observe a time series of a single smooth, instantaneousfunction of state xt = g(zt )Set st = (xt , xt−1, . . . xt−k+1)

Generically, if k ≥ 2m + 1, then zt = φ(st )φ is smooth and invertibleφ commutes with time evolution, φ(st+1) = f (φ(st ))Regressing st+1 on st gives φ−1 ◦ fIdea due to Packard et al. (1980); Takens (1981), modern review in Kantz and

Schreiber (2004)

Cosma Shalizi Markovian Representations

Page 88: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

“Geometry from a Time Series”

Deterministic dynamical system with state zt on a smoothmanifold of dimension m, zt+1 = f (zt )Only identified up to a smooth, invertible change of coordinates(diffeomorphism)Observe a time series of a single smooth, instantaneousfunction of state xt = g(zt )Set st = (xt , xt−1, . . . xt−k+1)Generically, if k ≥ 2m + 1, then zt = φ(st )φ is smooth and invertibleφ commutes with time evolution, φ(st+1) = f (φ(st ))Regressing st+1 on st gives φ−1 ◦ f

Idea due to Packard et al. (1980); Takens (1981), modern review in Kantz and

Schreiber (2004)

Cosma Shalizi Markovian Representations

Page 89: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

“Geometry from a Time Series”

Deterministic dynamical system with state zt on a smoothmanifold of dimension m, zt+1 = f (zt )Only identified up to a smooth, invertible change of coordinates(diffeomorphism)Observe a time series of a single smooth, instantaneousfunction of state xt = g(zt )Set st = (xt , xt−1, . . . xt−k+1)Generically, if k ≥ 2m + 1, then zt = φ(st )φ is smooth and invertibleφ commutes with time evolution, φ(st+1) = f (φ(st ))Regressing st+1 on st gives φ−1 ◦ fIdea due to Packard et al. (1980); Takens (1981), modern review in Kantz and

Schreiber (2004)

Cosma Shalizi Markovian Representations

Page 90: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

CSSR: Causal State Splitting Reconstruction

Key observation: Recursion + one-step-ahead predictivesufficiency⇒ general predictive sufficiency

Get next-step distribution right by independence testingThen make states recursive

Assumes discrete observations, discrete time, finite causalstatesPaper: Shalizi and Klinkner (2004); C++ code,http://bactra.org/CSSR/

Cosma Shalizi Markovian Representations

Page 91: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

One-Step Ahead Prediction

Start with all histories in the same state

Given current partition of histories into states, test whethergoing one step further back into the past changes the next-stepconditional distributionUse a hypothesis test to control false positive rate

If yes, split that cell of the partition, but see if it matches anexisting distributionMust allow this merging or else lose minimality

If no match, add new cell to the partitionStop when no more divisions can be made or a maximumhistory length Λ is reachedFor consistency, Λ < log n

h1+ιfor some ι

Cosma Shalizi Markovian Representations

Page 92: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

One-Step Ahead Prediction

Start with all histories in the same stateGiven current partition of histories into states, test whethergoing one step further back into the past changes the next-stepconditional distribution

Use a hypothesis test to control false positive rate

If yes, split that cell of the partition, but see if it matches anexisting distributionMust allow this merging or else lose minimality

If no match, add new cell to the partitionStop when no more divisions can be made or a maximumhistory length Λ is reachedFor consistency, Λ < log n

h1+ιfor some ι

Cosma Shalizi Markovian Representations

Page 93: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

One-Step Ahead Prediction

Start with all histories in the same stateGiven current partition of histories into states, test whethergoing one step further back into the past changes the next-stepconditional distributionUse a hypothesis test to control false positive rate

If yes, split that cell of the partition, but see if it matches anexisting distributionMust allow this merging or else lose minimality

If no match, add new cell to the partitionStop when no more divisions can be made or a maximumhistory length Λ is reachedFor consistency, Λ < log n

h1+ιfor some ι

Cosma Shalizi Markovian Representations

Page 94: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

One-Step Ahead Prediction

Start with all histories in the same stateGiven current partition of histories into states, test whethergoing one step further back into the past changes the next-stepconditional distributionUse a hypothesis test to control false positive rate

If yes, split that cell of the partition, but see if it matches anexisting distributionMust allow this merging or else lose minimality

If no match, add new cell to the partitionStop when no more divisions can be made or a maximumhistory length Λ is reachedFor consistency, Λ < log n

h1+ιfor some ι

Cosma Shalizi Markovian Representations

Page 95: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Ensuring Recursive Transitions

Need to determinize a probabilistic automatonSeveral ways of doing this; technical and not worth going intohereTrickiest part of the algorithm and can influence thefinite-sample behavior

Cosma Shalizi Markovian Representations

Page 96: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Convergence

S = true causal state structureSn = structure reconstructed from n data pointsAssume: finite # of states, every state has a finite history, usinglong enough histories, technicalities:

Pr(Sn 6= S

)→ 0

D = true distribution, Dn = inferredError scales like independent samples

E[‖Dn −D‖TV

]= O(n−1/2)

Cosma Shalizi Markovian Representations

Page 97: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Convergence

S = true causal state structureSn = structure reconstructed from n data pointsAssume: finite # of states, every state has a finite history, usinglong enough histories, technicalities:

Pr(Sn 6= S

)→ 0

D = true distribution, Dn = inferredError scales like independent samples

E[‖Dn −D‖TV

]= O(n−1/2)

Cosma Shalizi Markovian Representations

Page 98: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Handwaving

Empirical conditional distributions for histories converge(large deviations principle for Markov chains)

Histories in the same state become harder to accidentallyseparateHistories in different states become harder to confuseEach state’s predictive distribution converges O(n−1/2)(from LDP again, take mixture)

Cosma Shalizi Markovian Representations

Page 99: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Example: The Even Process

2 1B | 1.0B | 0.5

A | 0.5

Cosma Shalizi Markovian Representations

Page 100: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

2 1B | 1.000B | 0.503

A | 0.497

reconstruction with Λ = 3, n = 1000, α = 0.005

Cosma Shalizi Markovian Representations

Page 101: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Some Uses

Geomagnetic fluctuations (Clarke et al., 2003)Natural language processing (Padró and Padró, 2005a,c,b,2007a,b)Anomaly detection (Friedlander et al., 2003a,b; Ray, 2004)Information sharing in networks (Klinkner et al., 2006; Shaliziet al., 2007)Social media propagation (Cointet et al., 2007)Neural spike train analysis (Haslinger et al., 2010)

Cosma Shalizi Markovian Representations

Page 102: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

About “Causal”

Term “causal states” introduced by Crutchfield and Young(1989)

without too much precisionAll about probabilistic prediction, not counterfactuals(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain

trajectories)

Still, those screening-off properties are really suggestive

Cosma Shalizi Markovian Representations

Page 103: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

About “Causal”

Term “causal states” introduced by Crutchfield and Young(1989) without too much precision

All about probabilistic prediction, not counterfactuals(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain

trajectories)

Still, those screening-off properties are really suggestive

Cosma Shalizi Markovian Representations

Page 104: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

About “Causal”

Term “causal states” introduced by Crutchfield and Young(1989) without too much precisionAll about probabilistic prediction, not counterfactuals

(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain

trajectories)

Still, those screening-off properties are really suggestive

Cosma Shalizi Markovian Representations

Page 105: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

About “Causal”

Term “causal states” introduced by Crutchfield and Young(1989) without too much precisionAll about probabilistic prediction, not counterfactuals(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain

trajectories)

Still, those screening-off properties are really suggestive

Cosma Shalizi Markovian Representations

Page 106: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

About “Causal”

Term “causal states” introduced by Crutchfield and Young(1989) without too much precisionAll about probabilistic prediction, not counterfactuals(selecting sub-ensembles of naturally-occurring trajectories, not enforcing certain

trajectories)

Still, those screening-off properties are really suggestive

Cosma Shalizi Markovian Representations

Page 107: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Back to Physics

(Shalizi and Moore, 2003)

Assume: Microscopic state Zt ∈ Z, with an evolution operator f

Assume: Micro-states support counterfactualsAssume: Never get to see Zt , instead deal with Xt = γ(Zt )Xt are coarse-grained, macroscopic variablesEach macrovariable gives a partition Γ of Z

Cosma Shalizi Markovian Representations

Page 108: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Back to Physics

(Shalizi and Moore, 2003)

Assume: Microscopic state Zt ∈ Z, with an evolution operator fAssume: Micro-states support counterfactuals

Assume: Never get to see Zt , instead deal with Xt = γ(Zt )Xt are coarse-grained, macroscopic variablesEach macrovariable gives a partition Γ of Z

Cosma Shalizi Markovian Representations

Page 109: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Back to Physics

(Shalizi and Moore, 2003)

Assume: Microscopic state Zt ∈ Z, with an evolution operator fAssume: Micro-states support counterfactualsAssume: Never get to see Zt , instead deal with Xt = γ(Zt )

Xt are coarse-grained, macroscopic variablesEach macrovariable gives a partition Γ of Z

Cosma Shalizi Markovian Representations

Page 110: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Back to Physics

(Shalizi and Moore, 2003)

Assume: Microscopic state Zt ∈ Z, with an evolution operator fAssume: Micro-states support counterfactualsAssume: Never get to see Zt , instead deal with Xt = γ(Zt )Xt are coarse-grained, macroscopic variablesEach macrovariable gives a partition Γ of Z

Cosma Shalizi Markovian Representations

Page 111: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Sequences of Xt values refine Γ

Γ(T ) =T∧

t=1

f−t Γ

ε partitions histories of X∴ ε joins cells of Γ(∞)

∴ ε induces a partition ∆ of ZThis is a new, Markovian coarse-grained variable

Cosma Shalizi Markovian Representations

Page 112: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Sequences of Xt values refine Γ

Γ(T ) =T∧

t=1

f−t Γ

ε partitions histories of X

∴ ε joins cells of Γ(∞)

∴ ε induces a partition ∆ of ZThis is a new, Markovian coarse-grained variable

Cosma Shalizi Markovian Representations

Page 113: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Sequences of Xt values refine Γ

Γ(T ) =T∧

t=1

f−t Γ

ε partitions histories of X∴ ε joins cells of Γ(∞)

∴ ε induces a partition ∆ of ZThis is a new, Markovian coarse-grained variable

Cosma Shalizi Markovian Representations

Page 114: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Sequences of Xt values refine Γ

Γ(T ) =T∧

t=1

f−t Γ

ε partitions histories of X∴ ε joins cells of Γ(∞)

∴ ε induces a partition ∆ of Z

This is a new, Markovian coarse-grained variable

Cosma Shalizi Markovian Representations

Page 115: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Sequences of Xt values refine Γ

Γ(T ) =T∧

t=1

f−t Γ

ε partitions histories of X∴ ε joins cells of Γ(∞)

∴ ε induces a partition ∆ of ZThis is a new, Markovian coarse-grained variable

Cosma Shalizi Markovian Representations

Page 116: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Connecting to Causality

Interventions moving z from one cell of ∆ to another changesthe distribution of X∞t+1

Changing z inside a cell of ∆ might still make a difference“There must be at least this much structure”

Cosma Shalizi Markovian Representations

Page 117: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Connecting to Causality

Interventions moving z from one cell of ∆ to another changesthe distribution of X∞t+1Changing z inside a cell of ∆ might still make a difference

“There must be at least this much structure”

Cosma Shalizi Markovian Representations

Page 118: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Connecting to Causality

Interventions moving z from one cell of ∆ to another changesthe distribution of X∞t+1Changing z inside a cell of ∆ might still make a difference“There must be at least this much structure”

Cosma Shalizi Markovian Representations

Page 119: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Summary

Your stochastic process has a unique, minimal Markovianrepresentation

This representation has nice predictive propertiesCan reconstruct from sample data in some cases. . .and a lot more could be done in this lineThe Markov states have the right screening-off propertiesfor causal models

Cosma Shalizi Markovian Representations

Page 120: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Summary

Your stochastic process has a unique, minimal MarkovianrepresentationThis representation has nice predictive properties

Can reconstruct from sample data in some cases. . .and a lot more could be done in this lineThe Markov states have the right screening-off propertiesfor causal models

Cosma Shalizi Markovian Representations

Page 121: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Summary

Your stochastic process has a unique, minimal MarkovianrepresentationThis representation has nice predictive propertiesCan reconstruct from sample data in some cases. . .

and a lot more could be done in this lineThe Markov states have the right screening-off propertiesfor causal models

Cosma Shalizi Markovian Representations

Page 122: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Summary

Your stochastic process has a unique, minimal MarkovianrepresentationThis representation has nice predictive propertiesCan reconstruct from sample data in some cases. . .and a lot more could be done in this line

The Markov states have the right screening-off propertiesfor causal models

Cosma Shalizi Markovian Representations

Page 123: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Summary

Your stochastic process has a unique, minimal MarkovianrepresentationThis representation has nice predictive propertiesCan reconstruct from sample data in some cases. . .and a lot more could be done in this lineThe Markov states have the right screening-off propertiesfor causal models

Cosma Shalizi Markovian Representations

Page 124: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Clarke, Richard W., Mervyn P. Freeman and Nicholas W.Watkins (2003). “Application of Computational Mechanics tothe Analysis of Natural Data: An Example in Geomagnetism.”Physical Review E , 67: 0126203. URLhttp://arxiv.org/abs/cond-mat/0110228.

Cointet, Jean-Philippe, Emmanuel Faure and Camille Roth(2007). “Intertemporal topic correlations in online media.” InProceedings of the International Conference on Weblogs andSocial Media [ICWSM] . Boulder, CO, USA. URLhttp://camille.roth.free.fr/travaux/cointetfaureroth-icwsm-cr4p.pdf.

Crutchfield, James P. and Karl Young (1989). “InferringStatistical Complexity.” Physical Review Letters, 63:105–108. URL http://www.santafe.edu/~cmg/compmech/pubs/ISCTitlePage.htm.

Cosma Shalizi Markovian Representations

Page 125: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Friedlander, David S., Shashi Phoha and Richard Brooks(2003a). “Determination of Vehicle Behavior based onDistributed Sensor Network Data.” In Advanced SignalProcessing Algorithms, Architectures, and ImplementationsXIII (Franklin T. Luk, ed.), vol. 5205 of Proceedings of theSPIE . Bellingham, WA: SPIE. Presented at SPIE’s 48thAnnual Meeting, 3–8 August 2003, San Diego, CA.

Friedlander, Davis S., Isanu Chattopadhayay, Asok Ray, ShashiPhoha and Noah Jacobson (2003b). “Anomaly Prediction inMechanical System Using Symbolic Dynamics.” InProceedings of the 2003 American Control Conference,Denver, CO, 4–6 June 2003.

Gács, Péter, John T. Tromp and Paul M. B. Vitanyi (2001).“Algorithmic Statistics.” IEEE Transactions on InformationTheory , 47: 2443–2463. URL

Cosma Shalizi Markovian Representations

Page 126: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

http://arxiv.org/abs/math.PR/0006233.Grassberger, Peter (1986). “Toward a Quantitative Theory of

Self-Generated Complexity.” International Journal ofTheoretical Physics, 25: 907–938.

Haslinger, Robert, Kristina Lisa Klinkner and Cosma RohillaShalizi (2010). “The Computational Structure of SpikeTrains.” Neural Computation, 22: 121–157. URLhttp://arxiv.org/abs/1001.0036.doi:10.1162/neco.2009.12-07-678.

Iosifescu, Marius and Serban Grigorescu (1990). Dependencewith Complete Connections and Its Applications. Cambridge,England: Cambridge University Press. Revised paperbackprinting, 2009.

Jaeger, Herbert (2000). “Observable Operator Models forDiscrete Stochastic Time Series.” Neural Computation, 12:

Cosma Shalizi Markovian Representations

Page 127: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

1371–1398. URL http://www.faculty.iu-bremen.de/hjaeger/pubs/oom_neco00.pdf.

Jänicke, Heike, Alexander Wiebel, Gerik Scheuermann andWolfgang Kollmann (2007). “Multifield Visualization UsingLocal Statistical Complexity.” IEEE Transactions onVisualization and Computer Graphics, 13: 1384–1391. URLhttp://www.informatik.uni-leipzig.de/bsv/Jaenicke/Papers/vis07.pdf.doi:10.1109/TVCG.2007.70615.

Kantz, Holger and Thomas Schreiber (2004). Nonlinear TimeSeries Analysis. Cambridge, England: Cambridge UniversityPress, 2nd edn.

Klinkner, Kristina Lisa, Cosma Rohilla Shalizi and Marcelo F.Camperi (2006). “Measuring Shared Information andCoordinated Activity in Neuronal Networks.” In Advances in

Cosma Shalizi Markovian Representations

Page 128: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Neural Information Processing Systems 18 (NIPS 2005) (YairWeiss and Bernhard Schölkopf and John C. Platt, eds.), pp.667–674. Cambridge, Massachusetts: MIT Press. URLhttp://arxiv.org/abs/q-bio.NC/0506009.

Knight, Frank B. (1975). “A Predictive View of Continuous TimeProcesses.” Annals of Probability , 3: 573–596. URL http://projecteuclid.org/euclid.aop/1176996302.

— (1992). Foundations of the Prediction Process. Oxford:Clarendon Press.

Langford, John, Ruslan Salakhutdinov and Tong Zhang (2009).“Learning Nonlinear Dynamic Models.” Electronic preprint.URL http://arxiv.org/abs/0905.3369.

Littman, Michael L., Richard S. Sutton and Satinder Singh(2002). “Predictive Representations of State.” In Advances inNeural Information Processing Systems 14 (NIPS 2001)

Cosma Shalizi Markovian Representations

Page 129: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

(Thomas G. Dietterich and Suzanna Becker and ZoubinGhahramani, eds.), pp. 1555–1561. Cambridge,Massachusetts: MIT Press. URL http://www.eecs.umich.edu/~baveja/Papers/psr.pdf.

Onicescu, Octav and Gheorghe Mihoc (1935). “Sur les chaînesde variables statistiques.” Comptes Rendus de l’Académiedes Sciences de Paris, 200: 511–512.

Packard, Norman H., James P. Crutchfield, J. Doyne Farmerand Robert S. Shaw (1980). “Geometry from a Time Series.”Physical Review Letters, 45: 712–716.

Padró, Muntsa and Lluís Padró (2005a). “Applying a FiniteAutomata Acquisition Algorithm to Named EntityRecognition.” In Proceedings of 5th International Workshopon Finite-State Methods and Natural Language Processing

Cosma Shalizi Markovian Representations

Page 130: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

(FSMNLP’05). URL http://www.lsi.upc.edu/~nlp/papers/2005/fsmnlp05-pp.pdf.

— (2005b). “Approaching Sequential NLP Tasks with anAutomata Acquisition Algorithm.” In Proceedings ofInternational Conference on Recent Advances in NLP(RANLP’05). URL http://www.lsi.upc.edu/~nlp/papers/2005/ranlp05-pp.pdf.

— (2005c). “A Named Entity Recognition System Based on aFinite Automata Acquisition Algorithm.” Procesamiento delLenguaje Natural , 35: 319–326. URL http://www.lsi.upc.edu/~nlp/papers/2005/sepln05-pp.pdf.

— (2007a). “ME-CSSR: an Extension of CSSR using MaximumEntropy Models.” In Proceedings of Finite State Methods forNatural Language Processing (FSMNLP) 2007 . URL

Cosma Shalizi Markovian Representations

Page 131: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

http://www.lsi.upc.edu/%7Enlp/papers/2007/fsmnlp07-pp.pdf.

— (2007b). “Studying CSSR Algorithm Applicability on NLPTasks.” Procesamiento del Lenguaje Natural , 39: 89–96.URL http://www.lsi.upc.edu/%7Enlp/papers/2007/sepln07-pp.pdf.

Ray, Asok (2004). “Symbolic dynamic analysis of complexsystems for anomaly detection.” Signal Processing, 84:1115–1130.

Salmon, Wesley C. (1971). Statistical Explanation andStatistical Relevance. Pittsburgh: University of PittsburghPress. With contributions by Richard C. Jeffrey and James G.Greeno.

— (1984). Scientific Explanation and the Causal Structure ofthe World . Princeton: Princeton University Press.

Cosma Shalizi Markovian Representations

Page 132: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Shalizi, Cosma Rohilla (2001). Causal Architecture, Complexityand Self-Organization in Time Series and Cellular Automata.Ph.D. thesis, University of Wisconsin-Madison. URLhttp://bactra.org/thesis/.

— (2003). “Optimal Nonlinear Prediction of Random Fields onNetworks.” Discrete Mathematics and Theoretical ComputerScience, AB(DMCS): 11–30. URLhttp://arxiv.org/abs/math.PR/0305160.

Shalizi, Cosma Rohilla, Marcelo F. Camperi and Kristina LisaKlinkner (2007). “Discovering Functional Communities inDynamical Networks.” In Statistical Network Analysis:Models, Issues, and New Directions (Edo Airoldi andDavid M. Blei and Stephen E. Fienberg and AnnaGoldenberg and Eric P. Xing and Alice X. Zheng, eds.), vol.4503 of Lecture Notes in Computer Science, pp. 140–157.

Cosma Shalizi Markovian Representations

Page 133: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

New York: Springer-Verlag. URLhttp://arxiv.org/abs/q-bio.NC/0609008.

Shalizi, Cosma Rohilla and James P. Crutchfield (2001).“Computational Mechanics: Pattern and Prediction, Structureand Simplicity.” Journal of Statistical Physics, 104: 817–879.URL http://arxiv.org/abs/cond-mat/9907176.

Shalizi, Cosma Rohilla, Robert Haslinger, Jean-BaptisteRouquier, Kristina Lisa Klinkner and Cristopher Moore(2006). “Automatic Filters for the Detection of CoherentStructure in Spatiotemporal Systems.” Physical Review E ,73: 036104. URLhttp://arxiv.org/abs/nlin.CG/0508001.

Shalizi, Cosma Rohilla and Kristina Lisa Klinkner (2004). “BlindConstruction of Optimal Nonlinear Recursive Predictors forDiscrete Sequences.” In Uncertainty in Artificial Intelligence:

Cosma Shalizi Markovian Representations

Page 134: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Proceedings of the Twentieth Conference (UAI 2004) (MaxChickering and Joseph Y. Halpern, eds.), pp. 504–511.Arlington, Virginia: AUAI Press. URLhttp://arxiv.org/abs/cs.LG/0406011.

Shalizi, Cosma Rohilla, Kristina Lisa Klinkner and RobertHaslinger (2004). “Quantifying Self-Organization withOptimal Predictors.” Physical Review Letters, 93: 118701.URL http://arxiv.org/abs/nlin.AO/0409024.

Shalizi, Cosma Rohilla and Cristopher Moore (2003). “What Isa Macrostate? From Subjective Measurements to ObjectiveDynamics.” Electronic pre-print. URLhttp://arxiv.org/abs/cond-mat/0303625.

Takens, Floris (1981). “Detecting Strange Attractors in FluidTurbulence.” In Symposium on Dynamical Systems and

Cosma Shalizi Markovian Representations

Page 135: Markovian (and conceivably causal) representations of ...cshalizi/Talks/2010-07-10-UAI.pdf · Causality References Notation and setting “Causal” States (Crutchfield and Young,

Set-UpOptimality Properties

ReconstructionCausality

References

Turbulence (D. A. Rand and L. S. Young, eds.), pp. 366–381.Berlin: Springer-Verlag.

Tishby, Naftali, Fernando C. Pereira and William Bialek (1999).“The Information Bottleneck Method.” In Proceedings of the37th Annual Allerton Conference on Communication, Controland Computing (B. Hajek and R. S. Sreenivas, eds.), pp.368–377. Urbana, Illinois: University of Illinois Press. URLhttp://arxiv.org/abs/physics/0004057.

Cosma Shalizi Markovian Representations