asymptotic behavior of stochastic complexity of complete bipartite graph-type boltzmann machines yu...
DESCRIPTION
Problem : Calculations which include a Bayes posterior require huge computational cost. Mean field approximation a Bayes posterior a trial distribution Stochastic Complexity Accuracy of approximation Difference from regular Model selection statistical modelsTRANSCRIPT
Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines
Yu Nishiyama and Sumio Watanabe
Tokyo Institute of Technology, Japan
BackgroundLearning machines
Mixture modelsHidden Markov modelsBayesian networks
Pattern recognitionNatural language processing
Gene analysis
Information systems
mathematically
Bayes learning is effective
Singular statistical models
Problem : Calculations which include a Bayes posterior require huge computational cost.
Mean field approximation
a Bayes posterior a trial distribution
Stochastic Complexity
Accuracy of approximation Difference from regular Model selection statistical models
Asymptotic behavior of mean field stochastic complexities are studied.
Mixture models [ K. Watanabe, et al. 2004. ] Reduced rank regressions [ Nakajima, et al. 2005. ]
Hidden Markov models [ Hosino, et al. 2005. ] Stochastic context-free grammar [ Hosino, et al. 2005. ]
Neural networks [ Nakano, et al. 2005. ]
PurposeWe derive the upper bound of mean field stochastic complexity of complete bipartite graph-type Boltzmann machines.
Boltzmann Machines
Graphical models
Spin systems
Table of ContentsReview
Bayes LearningMean Field ApproximationBoltzmann Machines
Main Theorem
Outline of the Proof Discussion and Conclusion
Main Theorem
( Complete Bipartite Graph-type )
Bayes Learning
1X 2X nX
)(
)|()()|( 1
n
i
n
in
XZ
XpXp
dXpxpXxp nn )|()|()|(
)(xqTrue distribution
)(
)|( xp model
prior
: Bayes posterior
: Bayes predictive distribution
Mean Field Approximation (1)
)()}(~exp{
)(
)|()()|( 1
nn
n
i
n
in
XZHn
XZ
XpXp
0
)()}(~exp{
)(log)()]|(||)([
d
XZHn
ffXpfD
nn
n
dHfndffXZ nn )(~)()(log)()(log
The Bayes posterior can be rewritten as
We consider a Kullback distance from a trial distribution
to the Bayes posterior
)(f
)|( nXp
.
.
Mean Field Approximation (2)
])(~)()(log)([)](log[ dHfndffEXZE nXn
X nn
When we restrict the trial distribution
)(f to
)()(1
ii
d
i
ff
The minimum value of
which minimizes )(f
}])(~)()(log)({min[)()(
dHfndffEnF nfX n
is called mean field stochastic complexity.
,
is called mean field approximation.
Complete Bipartite Graph-typeBoltzmann Machines
1y 2y 3y Ky
1x 2x Mx
Kunits
M units
ijw KMw
Mjjx 1}{
Kiiy 1}{
)exp(
)exp()|(
11
11
ijij
M
j
K
i
ijij
M
j
K
i
yxw
yxwwxp
yx
y
)(
)exp(11
wZ
yxw ijij
M
jy
K
i i
)(
)cosh(11
wZ
xw jij
M
j
K
i
parametric model takes }1,1{
True Distribution
1y Ky Ky
1x 2x Mx
K units
M units
1Ky
0ijw0
ijw
)( KK
We assume that the true distribution is included in the parametric model
)|( wxp and the number of hidden units is
.
)(
)cosh()|( 11
wZ
xwwxp
jij
M
j
K
i
True distribution is
Main TheoremThe mean field stochastic complexity of complete bipartite graph-type Boltzmann machines has the following upper bound.
CnKMMKnF
log4
)(
M: the number of input and output units K: the number of hidden units (learning machines)
K: the number of hidden units (true distribution)
C: constant
Outline of the Proof (Methods)
dwwHwfndwwfwfnF )(~)(~)(~log)(~)(
})ˆ(2
1exp{21)( 2
211
ijij
M
j
K
i
KM
www
})ˆ(exp{)(
1)(~ 2
11ijijij
M
j
K
i
wwNNZ
wf
normal distribution family
prior
depends on the BM
Outline of the Proof [lemma]
of parameter )(H dRand ,
such that the number of elements of the set
if there exists a value
0)( and0)ˆ(;ˆ
2
2
i
HHi
is less than or equal to r, mean field stochastic complexity has the
)1(log4
)( OnrdnF
0
rdero
non-z
following upper bound. Hessian matrix
For Kullback information
We apply this lemma to the Boltzmann machines.
)(
)cosh()( 11
wZ
xwwH
jij
M
j
K
i
x
)(
)cosh(
)(
)cosh(
log
11
11
wZ
xw
wZ
xw
jij
M
j
K
i
jij
M
j
K
i
Kullback information is given by
The second order differential is
wwwwH
ˆ
2
2 )(
ww
tt ˆ2
ˆ )(
Here
.
.
xxwt jj
M
j
)tanh(1
)|()ˆ|()|( ˆ wxfwxpwxfw
x, .
The parameter is a true parameter
*w
0w },,1{ KK for
.
wwwwH2
2 )(
0)( 2 wwtt
0)0tanh()tanh(1
xxxwt jj
M
j},,1{ KK
Then,
becomest
},,1{ KK
.
MKr KMd hold.
By using the lemma, we have
CnKMMKnF
log4
)( .
,
0
MK
KMero
non-z
Then,
.
and
Discussion
n
CnKM log
2
Comparison with other studiesregular statistical model
:Number of Training dataasymptotic
area
Bayes learning
mean field approximation
derived resultCnKMMK
log4
upper bound
algebraic geometry
[Yamazaki]
upper bound
Stochastic Complexity
ConclusionWe derived the upper bound of mean field stochastic complexity of complete bipartite graph-type Boltzmann Machines.
Lower bound
Future works
Comparison with experimental results