Theory Theory—
How We Come to Understand Our World
James P. Crutchfield
www.santafe.edu/∼chaos
3 May 2002
Abtract We are confronted, as never before, with an explosion in the size
of databases: from data-mining the web to dynamical brain imaging, the various
genome projects, and monitoring the millions of shipping containers that enter
our ports. These datasets are so vast that there is no conceivable way humans
can directly examine all of the data. What has been an intuitive and very human
discovery process—hypothesizing structures and building predictive theories about
the systems that produce them—must now be automated. But do we understand
the notions of structure and pattern well enough that we can teach machines to
discover them? How do we build theories that capture patterns in useful and
predictive ways? I will announce some very recent results on automated pattern
discovery and theory building for cellular automata. The lessons hint at what a
future ”artificial” science might look like.
Joint work with Carl McTague.
Data Explosion:
• Neurophysiology: multiple neuron recordings (> 100 neurons @ 1kHz)
• Web Data-Mining: 10-100 GB/day, multi-terabyte databases
• Astrophysical data: Hubble, EUV, ...
• Neuroimaging: MEG 100-200 Squids @ 1 kHz
• Geophysics: Earthquake monitoring with 1000s sensors, of different kinds
• BioInformatics: Genome projects, microarray sequencing, ...
• Searchable Video Databases
• Very large-scale simulations: weather, hydrodynamics, reaction kinetics, ...
• ...
1
Need machines to help.
Current approach is Pattern Recognition:
• Match data against existing palette of templates
• Fitting polynomials, assuming IID and distributions are Gaus-
sian, taking Fourier/Laplace Transforms, ...
Begs the Question:
Where did that palette come from in the first place?
2
How is this done now?
Baconian Scientific Algorithm:
Select Domain
?
Hypothesize
?
Test
?
Refine
¾
Hypothesis generation: Guessing within a discipline’s domain.
Must we always “guess” or intuit?
Problem: Hypothesis wrong? Little hint of where to go next.
The Inverse Problem: How to go from Data to Theory?
3
History of the Inverse Problem
Recently in nonlinear physics,
Attractor reconstruction:
• Packard et al (1980): “Geometry from a Time Series”
• Takens (1981), ...
“Theory From Time Series”:
• JPC/McNamara (1987): Equations of Motion from a Data Series
• Farmer/Sidorovich (1987): Nonlinear prediction
• Casdagli (1989), ...
Problem with this, too, alas.
4
The Underlying Problem: Intrinsic Representation
Reformulating the Question:
Does data contain information about an appropriate representa-
tion?
Solution:
Tackle head on: Define structure, pattern, regularity, ...
Computational Mechanics:
Accounts for computational structure.
Extends Statistical Mechanics: more than “statistics”.
5
Analogy:
Physics “accounts” for energy flow and transduction.
Computational Mechanics accounts for
1. Amount of historical information stored in process;
2. The architecture supporting that storage; and
3. How stored information is transformed into future states.
More directly:
1. Physics has measures of disorder: temperature, thermodynamic entropy.
2. Comp’l mechanics measures degrees of pattern: structural complexity.
6
Causal Architecture: A Review of Computational Mechanics
Processes:↔S= . . . S−1S0S1 . . . =
←S→S .
Causal States S: Sets of←S that are equally predictive about
→S
←s ∼
←s′such that P(
→S |←S=
←s ) = P(
→S |←S=
←s′) (1)
ε-Machine: M = S, T
• Theorem: M is the unique, optimal predictor of minimal size.
• Theorem: M is a minimal sufficient statistic for a process.
• The intrinsic representation of a process.
Stored information is Statistical Complexity: Cµ = H[P(S)].
JPC & K Young, Physical Review Letters 63 (1989) 105–108.
CR Shalizi & JPC, J. Statistical Physics 104 (2001) 817–879.
7
Cellular Automata: Artificial Physics’s
t
t+1
1 1 00
1 0 0 1
0
0
0
1
1
0
0
1
0
0
1
0
0
0
000
001
010
011
100
101
110
111
Look Up Tableφ
LUT OutputBit String
Glo
bal M
ap
Φ
Lattice of N Sites
Global StateSt
Global StateSt+1
Local State SiNeighborhood(radius = 1)
0
Time
49
49Site0
Elementary Cellular Automaton 18
ECA 18 Rule Table:
ηt 000 001 010 011 100 101 110 111st+1 = φ(ηt) 0 1 0 0 1 0 0 0
8
Cellular Automata Computational Mechanics
A* A*
Ω Φ(Ω)Φ
Space of configurations A∗: Represented by finite-state automata.
CA LUT: A finite-state transducer TΦ that reads in sets Ωt and outputs Ωt+1.
Language evolution:Implemented by Finite-Machine Evolution (FME) Operator Φ:(i) Φ(Ω), (ii) Drop Transducer Inputs, (iii) Strip Transient States, (iv) Convert NFA → DFA,and (v) Minimize.
Computational Complexity, starting with n-state language:
• Φ(Ω) and Drop Inputs: O(n).
• Identify Transient States: O(n2).
• NFA → DFA: O(2n).
• Minimize: O(n log2 n).
9
Wolfram (1984): Iterate all configurations Ω0 = A∗:
Ωt = ΦtΩ0 . (2)
Regular Language Complexity: |Ωt|
t ECA 18 ECA 22
1 5 152 47 2803 143 45064 ≥ 20,000 ≥ 20,000
10
A Structural View
Domain: Spacetime shift-invariant pattern
1. Λ = Φp(Λ), for some p > 0.
2. Λ = σm(Λ), for some m > 0.
Particle:
1. ΛiαΛj, α ∈ A∗.
2. α = Φp(α), for some p > 0.
Particle Interactions:
1. α+ β → γ.
2. α+ ν → ∅.
3. ....
11
Example: ECA 18
0
Time
99
99Site0
Elementary Cellular Automaton 18
Regularity: Regions of (0A)∗ ... A domain?
12
1
2
0 0 1
[0,0] 0|0
[0,1]
1|1
[1,0]
0|0 [1,1]
1|00|1
1|0
0|0
1|0
Start([1,1],2)
([1,0],1)
0|0
([1,1],1)
1|0
([0,0],2)
0|1
([1,0],2)
0|0
([0,0],1)
0|1
([0,1],1)
1|00|0
1|1
([0,1],2)
0|0 1|0
0|0 0|0
HypothesizedDomainECA 18 TransducerComposition
Iteration of ECA 18’s Domain
13
([1,1],2)
([1,0],1)
0
([1,1],1)
0
([0,0],2)
1
([1,0],2)
0
([0,0],1)
1
([0,1],1)
0 0
1
([0,1],2)
00
00
6
1
1
3
0
2
0 00
1
1
2
1 0 0
Strip TransientsDrop Inputs Minimize QED
14
Recognizing Structure: Domain Filtering
Factor out that data explained by structures one knows.
0
Time
99
99Site0
Elementary Cellular Automaton 18
0
Time
99
99Site0
ECA 18: Domain Filtered
15
ECA 18’s Particle CatalogDomain Process LanguageΛ0 (0Σ)∗
Particle Wedgesα Λ002n+10Λ0
Interactionsα+ α → ∅
The Enigma: ECA 22ECA 22 Lookup Table
ηt 000 001 010 011 100 101 110 111st+1 = φ(ηt) 0 1 1 0 1 0 0 0
0
Time
99
99Site0
History: highly disordered (high entropy) but ...
• Moore: ”... nonlocal structures ...”
• Grassberger (1986): ”... long-range, complex correlations ...”; Power law decays.
16
1
2
0
3
0
4
0
0 1
[0,0] 0|0
[0,1]
1|1
[1,0]
0|1 [1,1]
1|00|1
1|0
0|0
1|0
Start([1,1],4)
([1,0],1)
0|0
([1,1],1)
1|0
([0,0],2)
0|1
([1,0],2)
0|0
([1,1],3)
([1,0],4)
0|0
([0,0],1)
0|1
([0,1],1)
1|0
([1,1],2)
([1,0],3)
0|0
([0,0],4)
0|1
([0,0],3)
0|1
0|0
([0,1],4)
0|11|0
0|0
0|1
([0,1],3)
0|1
0|0 1|1
([0,1],2)
0|1
0|0
Start
Iterate of ECA 22’s Candidate Domain
CandidateDomainTransducerComposition
17
([1,1],4)
([1,0],1)
0
([1,1],1)
0
([0,0],2)
1
([1,0],2)
0
([1,1],3)
([1,0],4)
0
([0,0],1)
1
([0,1],1)
0
([1,1],2)
([1,0],3)
0
([0,0],4)
1
([0,0],3)
1
0
([0,1],4)
10
0
1
([0,1],3)
1
0 1
([0,1],2)
1
0
10
3
1
4
0
5
1
1
1
0
2
0
0
Drop InputsStrip Transients
& Minimize
ECA 22 Domain Proof?
But ≠ Candidate!18
10
3
1
4
0
5
1
1
1
0
2
0
0
[0,0] 0|0
[0,1]
1|1
[1,0]
0|1 [1,1]
1|00|1
1|0
0|0
1|0
Start
Second Iterate of ECA 22’s Candidate Domain
CandidateIterateTransducerComposition
([1,1],1)
([1,0],2)
0|0
([0,0],3)
0|1
([1,1],2)
([1,0],3)
0|0
([0,0],4)
0|1
([1,1],4)
([1,0],1)
0|0
([1,1],5)
1|0
([0,0],2)
0|1([1,1],10)
1|0
([1,0],5)
([0,1],10)
1|0
([1,1],3)
1|0
([1,0],10)
([0,1],3)
1|0
([1,0],4)
0|1
0|0
([0,1],1)
0|1
([0,1],5)
1|1
([0,0],1)
0|0
([0,1],2)
0|1
1|00|0
([0,1],4)
0|11|0
0|0
([0,0],5)
1|1
1|0
0|1
1|0
0|0
([0,0],10)
1|1
Start
19
([1,1],1)
([1,0],2)
0
([0,0],3)
1
([1,1],2)
([1,0],3)
0
([0,0],4)
1
([1,1],4)
([1,0],1)
0
([1,1],5)
0
([0,0],2)
1
([1,1],10)
0
([1,0],5)
([0,1],10)
0
([1,1],3)
0
([1,0],10)
([0,1],3)
0
([1,0],4)
1
0
([0,1],1)
1
([0,1],5)
1
([0,0],1)
0
([0,1],2)
1
00
([0,1],4)
10
0
([0,0],5)
1
0
1
0
0
([0,0],10)
1
22
15
0
8
0
6
1
19
0
0
3
1
0
5
0
4
0
0
1
4
0
3
0 1
2
0
0
Drop Inputs Strip Transients
ECA 22 Domain Proof Completed
Minimize QED!
20
1
2
0
3
0
4
0
0 1
Theorem: Λ ,Λ is a spacetime domain for ECA 22.
Proof: (i) Λ = Φ(Λ )
(ii) Λ = Φ(Λ )
in other words
(iii) Λ = Φ (Λ )
(iv) Λ = Φ (Λ )
2
10 0
0
Λ10Λ0
0
10
00
00
00
10
10 2
100
0
10
3
1
4
0
5
1
1
1
0
2
0
0
This is the first structure discovered for ECA 22.Captures much of ECA 22’s behavior.
21
The Enigma? ECA 22
0
Time
99286Site0
0
Time
99286Site0
22
ECA 22’s Particle CatalogDomains Process LanguageΛ0 Λ0
0 = (000Σ)∗
Λ01 = (1110+ 0000)
∗
Λ1 0∗
Λ2 (01)∗
Λ3 (0011)∗
Particles Wedgesα Λ0
0A001AΛ00
Λ01A′1111B′Λ0
1
β Λ00A01AΛ
00
Λ01A′110B′Λ0
1Λ0
1A′001B′Λ01
Interactionsα+ α → ββ+ β → ∅
α+ β → Λα−βgasβ+ α → Λα−βgas
ECA 22: Pattern Discovery
Theorem: Λ00,Λ01 is the only (positive entropy) domain within
the complexity horizon (up to and including 6-state candidates).
Proof: Automated proof methods using the Haskell, a functional
programming language.
How was this done?
23
Candidates for Domains: Process Languages
Finite-state process language: represented by (recurrent portionof) ε-machine.
AKA finite-state machine with one strongly connected compo-nent; all states are start and final states.
n-DCL Library: The domain candidates with n-states.
How many are there?
States n-DCL Library
n Size
1 3
2 7
3 78
4 1,388
5 35,186
6 1,132,61324
Automated Search
1. For all Λ ∈ n-DCL, exclude by counterexample:
(a) Generate long s ∈ Λ.
(b) Iterate this single configuration Φt(s).
(c) Test: If Φt(s) /∈ Λ, then Λ is expanding.
2. For all nonexpansive n-DCL, exclude by entropy rate:
hµ(Φt(Λ)) < hµ(Λ)⇒ Candidate contracts. (3)
3. For all nonexpanding, noncontracting n-DCL, attempt to di-
rectly prove invariance theorem: Λ = ΦpΛ?
Λ that pass tests are domains.
25
ECA 22: Table of a Million Theorems
States n-DCL ECA 22 Nonexpanding n-DCLn Size Number Contracting Domains1 3 2 Σ∗ Λ1 = 0∗
2 7 2 (0Σ)∗ Λ2 = (01)∗
3 78 04 1,388 2 Λ0
0, Λ3 = (0011)∗
5 35,186 5 ((11)(11 + 01)∗(10) + 00)∗
((101)(00+ 1) + (1(00+ 1) + 0))∗
([(101)+(1(00 + 1) + 00)] + [1(00+ 1) + 0])∗
([1+(01)(00 + 1)] + [1+(00) + 0])∗
([(1+01)+(1+00+ 00)] + [1+00+ 0])∗
6 1,132,613 268 267 Λ01
QED
Several months ago: Estimated compute time ∼ 5 Beowulf years!
When finally proved, proof took ∼ 1 week: speed = 7000 tph!
26
1 1
2
0 0
3
1
1
4
0
0
5
1
1
6
0
0
[0,0] 0|0
[0,1]
1|1
[1,0]
0|1 [1,1]
1|00|1
1|0
0|0
1|0
Start
ECA 22’s Last Candidate
Transducer Candidate 27
Iterate of the Last Candidate161 states and 318 transitions
Theorem: Not a domain.Proof: Nonexpanding + h(Φ(Λ)) < h(Λ) ⇒ Candidate contracts.
91 1010
94
1
1001
104
0
96
1
93
0
92
0
1
97 1320
127
1
1311
147
0
1
126
0
98
0
1
95
01
128
1
159
0
1580
129 01
0
130
1
1
1
106
0
1050
78
1
2
0
6
1
4
1
5
0
3
76
0
46
1
75
1
137
0
45
0
44
1
1
0
0
82
1
0
811
7
1
107
0
0
134
1
8
1130
16
1
153
1
1120
15
0
40
1
9
121
0
61
1
31
1
120
0
60
1
86
0
10
0
1171
12
1
116
0
1163
0
331
1090
62
1
32
1
152
0
1
0
13
0
55
1
154
1
157
0
0
1
14
0
67
1
660
114
1
115
0
30
1
1
1190
29
1
0
181
39
0
17
1
0
1
0
19
0
87
1
841
0
20
0
36
1
351
135
0
21
1
85
0
155
1
148
0
220
1
271
0
23
1
37
0
1
138
0
24
1
0250
1
1
26
0
0
1601
0
1
28
1
79
0
0
1
01
1
122 0
0
1251
0
1
1510
34
1
139
0
0
83
1
1
0
143
0
11
0
38
0
1
0
68
1
0
1
1
431
145
0
47
1
0
48
0
108
1
0
501
49
1
0
1
0
1 144
0
0
1
1
990
0
1
51
1
0
1
142
0
52
0
90
1
89
1
102
0
53
1
0
54
0
1
1
0
56
0
1
57
0
124
1
1
123
0
58
1
0
59
0
70
1
65
1
69
0
1
0
1
0
64
0
1
1
0
01
71 1
0
1
0
72
1
0
730
1
1
770
74 0
1
1
133
0
1
0
0
136
1
01
80
1
0
0
1
1
0
0
1
0
1
0
1
161
0
1
1
0
0
1
0
1
149
1
0
1
0
1501
0
1
0
0
1
1
0
156
1
0
10
41
1
0
42
1
0
1
140
0
0
1
0 1
0
1461
88
1
0
1
0
0
10
1031
1
0
110
1
0
111
1
0
1
0
118
1
0
0
1
1
0
10
1
0
0
0
1
1411
0
1
0
1
0
0
1
1
0
28
The Near Term: All Elementary CAs
Push of a Button:
1. CA Pattern Discoverer will automatically build structural the-
ories for all 256 ECAs.
2. Result: a website of about 1000 pages, with several dozen
pages per ECA.
Beware of Prophets Expounding the CA Gospel!
29
The Future: Artificial Science
Implications:
1. Artificial Particle Physics.
2. Automated Pattern Discovery.
3. Theorists unemployed?
30