the dynamics of learning vector quantization
DESCRIPTION
The Dynamics of Learning Vector Quantization. Barbara Hammer. Michael Biehl, Anarta Ghosh. TU Clausthal-Zellerfeld Institute of Computing Science. Rijksuniversiteit Groningen Mathematics and Computing Science. The dynamics of learning. a model situation: randomized data - PowerPoint PPT PresentationTRANSCRIPT
The Dynamics of Learning Vector Quantization RUG 10012005
The Dynamics of Learning Vector Quantization
Rijksuniversiteit Groningen
Mathematics and Computing Science
Michael Biehl Anarta GhoshTU Clausthal-Zellerfeld
Institute of Computing Science
Barbara Hammer
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ)Learning Vector Quantization (LVQ)
Introduction
The dynamics of learning
a model situation randomized datalearning algorithms for VQ und LVQanalysis and comparison dynamics success of learning
Summary
Outlook
prototype-based learning from example datarepresentation classification
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ)
aim
representation of large amounts
of data by (few) prototype vectors
example
identification and grouping
in clusters of similar data
assignment of feature vector to the closest prototype w
(similarity or distance measure
eg Euclidean distance )
The Dynamics of Learning Vector Quantization RUG 10012005
unsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototype ie the so-called winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function
The Dynamics of Learning Vector Quantization RUG 10012005
quantization error
μj
μk
K
jk
P
1μj
μK
1jVQ ddΘ
2 wξH
μjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors - the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)
aim
classification of data
learning from examples
Learning choice of prototypes according to example data
example situtation
3 classes
classification
assignment of a vector to the class of the closest
prototype w
3 prototypes
aim generalization ability ie correct
classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors (for different classes)
bull identify the closest correct and the closest wrong prototype
bull move the corresponding winner towards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ)Learning Vector Quantization (LVQ)
Introduction
The dynamics of learning
a model situation randomized datalearning algorithms for VQ und LVQanalysis and comparison dynamics success of learning
Summary
Outlook
prototype-based learning from example datarepresentation classification
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ)
aim
representation of large amounts
of data by (few) prototype vectors
example
identification and grouping
in clusters of similar data
assignment of feature vector to the closest prototype w
(similarity or distance measure
eg Euclidean distance )
The Dynamics of Learning Vector Quantization RUG 10012005
unsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototype ie the so-called winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function
The Dynamics of Learning Vector Quantization RUG 10012005
quantization error
μj
μk
K
jk
P
1μj
μK
1jVQ ddΘ
2 wξH
μjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors - the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)
aim
classification of data
learning from examples
Learning choice of prototypes according to example data
example situtation
3 classes
classification
assignment of a vector to the class of the closest
prototype w
3 prototypes
aim generalization ability ie correct
classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors (for different classes)
bull identify the closest correct and the closest wrong prototype
bull move the corresponding winner towards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ)
aim
representation of large amounts
of data by (few) prototype vectors
example
identification and grouping
in clusters of similar data
assignment of feature vector to the closest prototype w
(similarity or distance measure
eg Euclidean distance )
The Dynamics of Learning Vector Quantization RUG 10012005
unsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototype ie the so-called winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function
The Dynamics of Learning Vector Quantization RUG 10012005
quantization error
μj
μk
K
jk
P
1μj
μK
1jVQ ddΘ
2 wξH
μjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors - the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)
aim
classification of data
learning from examples
Learning choice of prototypes according to example data
example situtation
3 classes
classification
assignment of a vector to the class of the closest
prototype w
3 prototypes
aim generalization ability ie correct
classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors (for different classes)
bull identify the closest correct and the closest wrong prototype
bull move the corresponding winner towards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
unsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototype ie the so-called winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function
The Dynamics of Learning Vector Quantization RUG 10012005
quantization error
μj
μk
K
jk
P
1μj
μK
1jVQ ddΘ
2 wξH
μjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors - the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)
aim
classification of data
learning from examples
Learning choice of prototypes according to example data
example situtation
3 classes
classification
assignment of a vector to the class of the closest
prototype w
3 prototypes
aim generalization ability ie correct
classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors (for different classes)
bull identify the closest correct and the closest wrong prototype
bull move the corresponding winner towards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
quantization error
μj
μk
K
jk
P
1μj
μK
1jVQ ddΘ
2 wξH
μjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors - the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)
aim
classification of data
learning from examples
Learning choice of prototypes according to example data
example situtation
3 classes
classification
assignment of a vector to the class of the closest
prototype w
3 prototypes
aim generalization ability ie correct
classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors (for different classes)
bull identify the closest correct and the closest wrong prototype
bull move the corresponding winner towards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)
aim
classification of data
learning from examples
Learning choice of prototypes according to example data
example situtation
3 classes
classification
assignment of a vector to the class of the closest
prototype w
3 prototypes
aim generalization ability ie correct
classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors (for different classes)
bull identify the closest correct and the closest wrong prototype
bull move the corresponding winner towards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors (for different classes)
bull identify the closest correct and the closest wrong prototype
bull move the corresponding winner towards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σN2
-2
1exp
2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation ℓ ℓ
jj Bσσξ
22222 Nξ1ξξN
1σσ
j
jjj ξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Ninfin)
400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ
B
yξ
(240)(160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
(240)(160)
projections in two independent random directions w12
μ 11x ξw
model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of on-line training
sequence of independent random data 123μμ ξ acc to μP ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervised Vector Quantization dd f μs
μss
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect
classwrong
here two prototypes no explicit competition
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww
21
μs
μμsd
1σS
wξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
Ν1Οffη QxfηQxfη
1N
Ryfη1N
RR
ts1-μ
stμst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
2
1-μs
μμs-
μss
1-μs
μs σSddf
N
ηwξww recursions
mathematical analysis of the learning dynamics
1221 -μss
μs
μμs
μμs Q2xd ξwξ
μμμ1-μs
μs ξByx ξwprojections
distances
random vector ξμ enters only in the form of
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x
Bww j
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N
random vector acc to σ)|P( μ ξμμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσtσsσt s Q xx- xx sσσsσ s R yx- yx
yy- yy σσσ
else
σ ifsσσ y
0
S
2 average over the current example
averaged recursions closed in Rsσ Qst p σ1σ
σ
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging properties
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions coupled ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
ddpddp gε
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classification with minimal generalization error
B-
B+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weightsℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
(analytical)integrationfor ws(0) = 0
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
p = (1+m ) 2 (mgt0)
[Seo Obermeyer] LVQ21 harr cost function
(likelihood ratios)
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
strategies
- selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquo
numericalintegrationfor ws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 =12
averaged over 100 indep runs
Q++
Q--
Q+-
α
w+
w-
ℓ B+
ℓ B-
trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary____ asymptotic position
RS
+
RS-
R--
R-+
R--
R++
winner ws 1
I) LVQ 1 [Kohonen] 1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class membership
w-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
learning curve
εg =12
(p+=02 ℓ=12)
εg (αinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
η0 - variable rate η(α)
- well-defined asymptotics
(ODE linear in η)
10
εg
20 30 40 50 0 014
026
022
018
min εg
(η α)
η0η 0 αinfin
( η α ) infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquo
II ) LVQ+ ( only positive steps without repulsion)
1-μs
μSμσμS
μS
1-μs
μs δdd
N
ηwξww
winner correct
αinfin asymptotic configuration
symmetric about ℓ (B++B-)2
w-
w+
ℓ B+
ℓ B-
p+=02 ℓ=12 =12
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
p+
min p+p-
- LVQ 1
here close to optimal
classification
p+
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 =10εg
α
learning curves
LVQ+
LVQ1
asymptotics η0 (ηα)infin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic
generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications