Symmetry Breaking Bifurcationsof the Information Distortion
Dissertation DefenseApril 8, 2003
Albert E. Parker III
Complex Biological Systems Department of Mathematical Sciences
Center for Computational Biology
Montana State University
Goal: Solve the Information Distortion Problem
The goal of my thesis is to solve the Information Distortion problem, an optimization problem of the form
maxqG(q) constrained by D(q)D0
where
is a subset of Rn.• G and D are sufficiently smooth in .• G and D have symmetry: they are invariant to some group action.
Problems of this form arise in the study of clustering problems or optimal source coding systems.
Goal: Another Formulation
Using the method Lagrange multipliers, the goal of finding solutions of the optimization problem can be rephrased as finding stationary points of the problem
maxqF(q,) = maxq(G(q)+D(q))
where [0,). is a subset of RNK.• G and D are sufficiently smooth in .• G and D have symmetry: they are invariant to some group action.
How: Determine the Bifurcation Structure
We have described the bifurcation structure of stationary points to any problem of the form
maxqF(q,) = maxq(G(q)+D(q))
where [0,). is a linear subset of RNK.• G and D are sufficiently smooth in .• G and D have symmetry: they are invariant to some group action.
Thesis Topics
The Data Clustering ProblemThe Neural Coding Problem Information Theory / Probability TheoryOptimization TheoryDynamical SystemsBifurcation Theory with SymmetriesGroup TheoryContinuation Techniques
Outline of this talk
The Data Clustering ProblemA Class of Optimization ProblemsBifurcation with SymmetriesNumerical Results
The Data Clustering Problem
• Data Classification: identifying all of the books printed in 2002
which address the martial art Kempo
• Data Compression: converting a bitmap file to a jpeg file
Y YN
q(YN|Y) : a clustering
K objects {yi} N objects {yNi}
A Symmetry: invariance to relabelling of the clusters of YN
Y YN
q(YN|Y) : a clustering
K objects {yi} N objects {yNi}
class 1
class 2
A Symmetry: invariance to relabelling of the clusters of YN
Y YN
q(YN|Y) : a clustering
K objects {yi} N objects {yNi}
class 2
class 1
Requirements of a Clustering Method
• The original data is represented reasonably well by the clusters
– Choosing a cost function, D(Y,YN) , called a distortion function, rigorously defines what we mean by the “data is represented reasonably well”.
• Fast implementation
• Deterministic Annealing (Rose 1998) A Fast Clustering Algorithm
max H(YN|Y) constrained by D(Y,YN) D0
• Rate Distortion Theory (Shannon ~1950) Minimum Informative Compression
min I(Y,YN) constrained by D(Y,YN) D0
qC,
Examplesoptimizing at a distortion level D(Y,YN) D0
q
NK
YyNN Yyyyqyyq
NN
,1)|(|)|(:
Inputs and Outputs and Clustered Outputs
• The Information Distortion method clusters the outputs Y into clusters YN so that the information that one can learn about X by observing YN , I(X;YN), is as close as possible to the mutual information I(X;Y)• The corresponding information distortion function is
DI(Y;YN)=I(X;Y) - I(X;YN )
X Y
Inputs Outputs
YN
q(YN |Y)
Clusters
K objects {yi} N objects {yNi}L objects {xi}
p(X,Y)
• Information Distortion Method (Dimitrov and Miller 2001)
max H(YN|Y) constrained by DI(Y,YN) D0
max H(YN|Y) + I(X;YN)
• Information Bottleneck Method (Tishby, Pereira, Bialek 1999)
min I(Y,YN) constrained by DI(Y,YN) D0
max –I(Y,YN) + I(X;YN)
q
Two optimization problems which use the information distortion function
q
q
q
An annealing algorithmto solve
maxqF(q,) = maxq(G(q)+D(q))
Let q0 be the maximizer of maxq G(q), and let 0 =0. For k 0, let (qk , k ) be a solution to maxq G(q) + D(q ). Iterate the following steps until K = max for some K.
1. Perform -step: Let k+1 = k + dk where dk>0
2. The initial guess for qk+1 at k+1 is qk+1(0) = qk + for some small
perturbation .
3. Optimization: solve maxq (G(q) + k+1 D(q)) to get the maximizer qk+1 , using initial guess qk+1
(0) .
Application of the annealing method to the Information Distortion problem maxq (H(YN|Y) + I(X;YN))
when p(X,Y) is defined by four gaussian blobs
Inputs
Outputs
X Y
52 objects52 objects
p(X,Y)
Y YN
q(YN |Y)
52 objects N objects I(X;YN)=D(q(YN|Y))
Observed Bifurcations for the Four Blob problem:
We just saw the optimal clusterings q* at some *= max . What do the clusterings look like for < max ??
Bifurcations of q *()
Observed Bifurcations for the 4 Blob Problem
Conceptual Bifurcation Structure
q*
Nq
1*
??????
Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations?
What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type?
How many bifurcating solutions are there?
What do the bifurcating branches look like? Are they subcritical or supercritical ?
What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem?
Are there bifurcations after all of the classes have resolved ?
q*
Conceptual Bifurcation Structure
Observed Bifurcations for the 4 Blob Problem
Bifurcations with symmetry• To better understand the bifurcation structure, we capitalize on
the symmetries of the function G(q)+D(q)
• The “obvious” symmetry is that G(q)+D(q) is invariant to relabelling of the N classes of YN
• The symmetry group of all permutations on N symbols is SN.
switch labels 1 and 3
Symmetry Breaking Bifurcations
q*
4
11
N
q
*q
41 by fixed is SSq N
N
31* by fixed is SSq N
*q
22* by fixed is SSq N
Existence Theorems for Bifurcating Branches
Given a bifurcation at a point fixed by SN ,
• Equivariant Branching Lemma (Vanderbauwhede and Cicogna 1980-1)
• There are N bifurcating branches, each which have symmetry SN-1 .
• The Smoller-Wasserman Theorem (Smoller and Wasserman 1985-6)
• There are bifurcating branches which have symmetry <(N-cycle)p> for every prime p|N, p<N.
q*
Given a bifurcation at a point fixed by SN-1 ,
• Equivariant Branching Lemma (Vanderbauwhede and Cicogna 1980-1)
• Gives N-1 bifurcating branches which have symmetry SN-2 .
• The Smoller-Wasserman Theorem (Smoller and Wasserman 1985-6)
• Gives bifurcating branches which have symmetry <(M-cycle)p> for every prime p|N-1, p<N-1 .
When N = 4, N-1=3, there are no bifurcating branches given by SW Theorem.
q*Existence Theorems for Bifurcating Branches
4S
3S3S
3S 3S
0
3
vv
v
v
0
3
vv
v
v
0
3vv
v
v
0
3vv
v
v
2S2S 2S2S2S2S2S2S
1
0
2
0
vv
v
2S 2S 2S2S
0
2
0
vv
v
0
2
0
vv
v
0
0
2
vv
v
0
2
0
vv
v
0
2
0
vv
v
0
0
2
v
v
v
0
20v
v
v
0
0
2
v
v
v
0
0
2
v
v
v
0
0
2
v
v
v
0
02v
v
v
A partial subgroup lattice for S4 and the corresponding bifurcating directions given by the Equivariant Branching Lemma
4S
4A
34,12 24,13
23,14
v
v
v
v
v
v
v
v
0)(Fix 4 A
v
v
v
v
)1324(
0))1234((Fix
A partial subgroup lattice for S4 and the corresponding bifurcating directions given by the Smoller-Wasserman Theorem
q*
Conceptual Bifurcation Structure
4S
3S3S
3S 3S
2S2S 2S2S2S2S2S2S
1
2S 2S 2S2S
The Equivariant Branching Lemma shows that the bifurcation structure from SM to SM-1 is …
Group Structure
q*
Conceptual Bifurcation Structure
q*
4S
3S3S
3S 3S
2S2S 2S2S2S2S2S2S
1
2S 2S 2S2S
Group Structure
The Equivariant Branching Lemma shows that the bifurcation structure from SM to SM-1 is …
The Smoller-Wasserman Theorem shows additional structure …
q*
q*
Conceptual Bifurcation Structure
4S
4A
34,12 24,13
23,14
)1324(
Group Structure
q*
Conceptual Bifurcation Structure
4S
4A
34,12 24,13
23,14
)1324(
Group Structure
q*
The Smoller-Wasserman Theorem shows additional structure … 3 branches from the S4 to S3 bifurcation only.
q*
Conceptual Bifurcation Structure
q*
If we stay on a branch which is fixed by SM , how many bifurcations are there?
q*
Conceptual Bifurcation Structure
4S
4A
34,12 24,13
23,14
)1324(
Group Structure
q*
Theorem: There are at exactly K/N bifurcations on the branch (q1/N , ) for the Information Distortion problem
There are 13bifurcations on the first
branch
Bifurcation theory in the presence of symmetries
enables us to answer the questions previously posed …
??????
Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations?
What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type?
How many bifurcating solutions are there?
What do the bifurcating branches look like? Are they subcritical or supercritical ?
What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem?
Are there bifurcations after all of the classes have resolved ?
q*
Conceptual Bifurcation Structure
Observed Bifurcations for the 4 Blob Problem
??????
Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations?There are N-1 symmetry breaking bifurcations from SM to SM-1 for M N.
What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type?
How many bifurcating solutions are there? There are at least N from the first bifurcation, at least N-1 from the next one, etc.
What do the bifurcating branches look like? They are subcritical or supercritical depending on the sign of the bifurcation discriminator (q*,*,uk) .
What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? No.
Are there bifurcations after all of the classes have resolved ? In general, no.
Conceptual Bifurcation StructureObserved Bifurcations for the 4 Blob Problem
q*
We can explain the bifurcation structure
of all problems of the form
maxq F(q, ) = maxq (G(q)+D(q))
where [0,). is a subset of RNK.• G and D are sufficiently smooth in .• G and D are invariant to relabelling of the classes of YN
• The blocks of the Hessian q(G+ D) at bifurcation satisfy a set of generic conditions.
This class of problems includes the Information Distortion problem.
Hessian d constraine theis , LqHessian nedunconstrai isFq
singular is , Lq
singular isFq rnonsingula isFq
rnonsingula is1
1
MN
iKi MIRB
1M 1M
Symmetry breaking
bifurcation
Impossible scenario
Saddle-node bifurcation
Impossible scenario
Non-generic
chap
ter
6
rnonsingula is1
1
MN
iKi MIRB
singular is1
1
MN
iKi MIRB
singular is1
1
MN
iKi MIRB
chap
ter
6
chap
ter
8
chap
ter
4
Equivariant Branching Lemma: Previous vs. Actual Bifurcation Structure
We used Continuation Techniques and the Theory of Bifurcations with Symmetries on the 4 Blob Problem using the Information Distortion method to get this picture.
Previous results:
Actual structure:
Singularity of F:
Singularity of L :
*
The bifurcation from S4 to S3 is subcritical …
(the theory predicted this since the bifurcation discriminator (q1/4,*,u)<0 )
Theorem: In general, either symmetry breaking bifurcations or saddle-node bifurcations can occur.
Outline of proof: The Equivariant Branching Lemma, Smoller-Wasserman
Theorem, and the following singularity structure:
singular is , Lq
singular isFq singular-non isFq
singular-non is1
1
MN
iKi MIRB
1M 1M
singular-non is1
1
MN
iKi MIRB
singular is1
1
MN
iKi MIRB
singular is1
1
MN
iKi MIRB
Conclusions
Symmetry breaking
bifurcation
ImpossibleScenario
Saddle-node bifurcation
Impossible scenario
Non-generic
Theorem: All symmetry breaking bifurcations from
SM to SM-1 are pitchfork-like, and there exists M
bifurcating branches, for which we have explicit
directions.
Conclusions
q*
Theorem: The bifurcation discriminator of the pitchfork-like branch (q*,*,*) + (tu,0,(t)) is
If (q*,*,uk) < 0, then the branch is subcritical. If (q*,*,uk) > 0, then the branch is supercritical.
Conclusions
],,[,)33(][][,[,3),( 42
,
213** vvvFvMMvv
qqELuuuq
sr srsr
LL
Theorem: Solutions of the optimization problem do not always persist from bifurcation.
Theorem: In general, bifurcations do not occur after all of the classes have resolved.
Conclusions
A numerical algorithm to solve max(G(q)+D(q))
Let q0 be the maximizer of maxq G(q), 0 =1 and s > 0. For k 0, let (qk , k ) be a solution to maxq G(q) + D(q ). Iterate the following steps until K = max for some K.
1. Perform -step: solve
for and select k+1 = k + dk where dk = (s sgn(cos )) /(||qk ||2 + ||k ||2 +1)1/2.
2. The initial guess for (qk+1,k+1) at k+1 is (qk+1
(0),k+1 (0)) = (qk ,k) + dk ( qk, k) .
3. Optimization: solve maxq (G(q) + k+1 D(q)) using pseudoarclength continuation to get the maximizer qk+1, and the vector of Lagrange multipliers k+1 using initial guess (qk+1
(0),k+1 (0)).
4. Check for bifurcation: compare the sign of the determinant of an identical block of each of q [G(qk) + k D(qk)] and q [G(qk+1) + k+1 D(qk+1)]. If a bifurcation is detected, then set qk+1
(0) = qk + d_k u where u is bifurcating direction and repeat step 3.
),,(),,( ,, kkkqk
kkkkq q
LL
k
kq
q
Details …
• The Dynamical System
• Types of Singularities
• Continuation Techniques
• The Explicit Group of Symmetries
• Explicit Existence Theorems for bifurcating branches
A Class of Problems
max F(q, ) = max(G(q)+D(q))
• G and D are sufficiently smooth in .
• G and D must be invariant under relabelling of the classes.
q q
The Dynamical SystemGoal: To determine the bifurcation structure of solutions to
maxq (G(q) + D(q)) for [0,) .
Method: Study the equilibria of the of the flow
•
• The Jacobian wrt q of the K constraints {YNq(YN|y)-1} is J=(IK IK … IK).
• If wT qF(q*,) w < 0 for every wker J, then q*() is a maximizer of .
• The first equilibrium is q*(0 = 0) 1/N.
• If wT qF(q*,) w < 0 for every wker J, then q*() is a maximiYNer of .
• The first equilibrium is q*(0 = 0) 1/N.
Yy z
yqq yzqqDqGqq
1)|()()(:),,( ,, L
KnKnq
:, L
• In our dynamical system
the hessian
determines the stability of equilibria and the location of bifurcation.
.
q L
0),,(, T
qq J
JFq L
Properties of the Dynamical System
Hessian d constraine theis , LqHessian nedunconstrai isFq
singular is , Lq
singular isFq rnonsingula isFq
rnonsingula is1
1
MN
iKi MIRB
1M 1M
Symmetry breaking
bifurcation
Impossible scenario
Saddle-node bifurcation
Impossible scenario
Non-generic
chap
ter
6
rnonsingula is1
1
MN
iKi MIRB
singular is1
1
MN
iKi MIRB
singular is1
1
MN
iKi MIRB
chap
ter
6
chap
ter
8
chap
ter
4
The Dynamical System
How:
Use numerical continuation in a constrained system to choose and to choose an initial guess to find the equilibria q*( ).
Use bifurcation theory with symmetries to understand bifurcations of the equilibria.
Investigating the Dynamical System
Continuation• A local maximum qk
*(k) of is an equilibrium of the gradient flow .• Initial condition qk+1
(0)(k+1(0)) is sought in tangent direction qk, which is found by solving the matrix system
• The continuation algorithm used to find qk+1*(k+1) is based on Newton’s method.
• Parameter continuation follows the dashed (---) path, pseudoarclength continuation follows the dotted (…) path ),,(),,( ,, kkkq
k
k
kkkq qq
q
LL
k)0(
1k
),( , kkkq
),,( 111 kkkq
),( )0(1
)0(1
)0(1 kkkq
),( 11 kkq
),( kkq
),( q
),( )0(1
)0(1 kkq
The Groups• Let P be the finite group of n ×n “block” permutation matrices which represents the action of SN
on q and F(q,) . For example, if N=3,
permutes q(YN1|y) with q(YN2|y) for every y
• F(q,) is P -invariant means that for every P, F( q,) = F(q,)
• Let be the finite group of (n+K) × (n+K) block permutation matrices
which represents the action of SN on and q, L(q,,):
q, L(q, , ) is -equivariant means that for every q, L(q, , ) = q, L( ,)
q
! |0
0: fixed are sconstraint and smultiplier lagrange the
P
KKnK
Kn
I
q
P
K
K
K
I
I
I
00
00
00
Notation and Definitions• The symmetry of is measured by its isotropy subgroup
• An isotropy subgroup is a maximal isotropy subgroup of if there does not exist an isotropy subgroup of such that .
• At bifurcation , the fixed point subspace of q*,* is
qqq |,
q
),( *
*
*
q
**** ,
***,,
,|),,(ker)(Fix
qqq
wwqw L
Equivariant Branching LemmaOne of the Existence Theorems we use to describe a bifurcation in the
presence of symmetries is the Equivariant Branching Lemma (Vanderbauwhede and Cicogna 1980-1).
Idea: The bifurcation structure of local solutions is described by the isotropy subgroups of which have dim Fix()=1. • System: .
• r(x,) is G-equivariant for some compact Lie Group G• • Fix(G)={0}• Let H be an isotropy subgroup of G such that dim Fix (H) = 1.• Assume r(0,0) 0 (crossing condition).
Then there is a unique smooth solution branch (tx0,(t)) to r = 0 such that x0 Fix (H) and the isotropy subgroup of each solution is H.
mmrxrx :),,(
0)0,0(,0)0,0( rr x
From bifurcation, the Equivariant Branching Lemma shows that the following solutions emerge:
An stationary point q* is M-uniform if there exists 1 M N and a
K x 1 vector P such that q(yNi|Y)=P for M and only M classes, {yNi}Ni=1
of YN. These M classes of YN are unresolved classes. The classes of YN that are not unresolved are called resolved.
The first equilibria, q* 1/N, is N-uniform.
Theorem: q* is M-uniform if and only if q* is fixed by SM.
Symmetry Breaking from SM to SM-1
Theorem: dim ker qF (q*,)=M with basis vectors {vi}Mi=1
Theorem: dim ker q,L (q*,,)=M-1 with basis vectors
Point: Since the bifurcating solutions whose existence is guaranteed by the EBL and the SW Theorem
are tangential to ker q,L (q*,,), then we know the explicit form of the bifurcating directions.
otherwise 0
class unresolved theis if ][
th
i
ivv
00
Mi vv
Kernel of the Hessian at Symmetry Breaking Bifurcation
Assumptions:• Let q* be M-uniform • Call the M identical blocks of qF (q*,): B. Call the other N-M blocks of qF (q*,):
{R}. We assume that B has a single nullvector v and that R is nonsingular for every .
• If M<N, then BR-1 + MIK is nonsingular.
Theorem: Let (q*,*,*) be a singular point of the flow
such that q* is M-uniform. Then there exists M bifurcating (M-1)-uniform solutions (q*,*,*) + (tuk,0,(t)), where
Symmetry Breaking Bifurcation from M-uniform solutions
otherwise 0
class unresolvedother any is if
class unresolved theis if)1(
][ kv
kvM
u
th
k
q L
Hessian d constraine theis , LqHessian nedunconstrai isFq
singular is , Lq
singular isFq rnonsingula isFq
rnonsingula is1
1
MN
iKi MIRB
1M 1M
Symmetry breaking
bifurcation
Impossible scenario
Saddle-node bifurcation
Impossible scenario
Non-generic
chap
ter
6
rnonsingula is1
1
MN
iKi MIRB
singular is1
1
MN
iKi MIRB
singular is1
1
MN
iKi MIRB
chap
ter
6
chap
ter
8
chap
ter
4
4S
3S3S
3S 3S
0
3
vv
v
v
0
3
vv
v
v
0
3vv
v
v
0
3vv
v
v
2S2S 2S2S2S2S2S2S
1
0
2
0
vv
v
2S 2S 2S2S
0
2
0
vv
v
0
2
0
vv
v
0
0
2
vv
v
0
2
0
vv
v
0
2
0
vv
v
0
0
2
v
v
v
0
20v
v
v
0
0
2
v
v
v
0
0
2
v
v
v
0
0
2
v
v
v
0
02v
v
v
Some of the bifurcating branches when N = 4 are given by the following isotropy subgroup lattice for S4
For the 4 Blob problem:The isotropy subgroups and bifurcating directions of the
observed bifurcating branches
isotropy group: S4 S3 S2 1bif direction: (-v,-v,3v,-v,0)T (-v,2v,0,-v,0)T (-v,0,0,v,0)T … No more bifs!
Smoller-Wasserman Theorem
The other Existence Theorem:
Smoller-Wasserman Theorem (1985-6)
For variational problems where
there is a bifurcating solution tangential to Fix(H) for every maximal isotropy subgroup H, not only those with dim Fix(H) = 1.
• dim Fix(H) =1 implies that H is a maximal isotropy subgroup
),(),( xfxr x
The Smoller-Wasserman Theorem shows that (under the same assumptions as before)
if M is composite, then there exists bifurcating solutions with isotropy group <p> for every element of order M in and every prime p|M, p<M. Furthermore,
dim (Fix <p>)=p-1
Other branches
4S
4A
34,12 24,13
23,14
v
v
v
v
v
v
v
v
0)(Fix 4 A
v
v
v
v
)1324(
0))1234((Fix
Bifurcating branches from a 4-uniform solution are given by the following isotropy subgroup lattice for S4
Issues: SM
• The full lattice of subgroups of the group SM is not known for arbitrary M.
• The lattice of maximal subgroups of the group SM is not known for arbitrary M.
More about the Bifurcation Structure
Theorem: All symmetry breaking bifurcations from SM to SM-1 are pitchfork-like.
Outline of proof: ’(0)=0 since 2xx r(0,0) =0.
Theorem: The bifurcation discriminator of the pitchfork-like branch
(q*,*,*) + (tuk,0,(t)) is
If (q*,*,uk) < 0, then the branch is subcritical. If (q*,*,uk) > 0, then the branch is supercritical.
Theorem: Generically, bifurcations do not occur after all of the classes have resolved.
Theorem: If dim (ker q,L (q*,,)) = 1, and if a crossing condition is satisfied, then saddle-node bifurcation must occur.
],,[,)33(][][,[,3),( 42
,
213** vvvFvMMvv
qqELuuuq
sr srsr
LL