transfer functions: hidden possibilities for better neural networks
DESCRIPTION
Transfer functions: hidden possibilities for better neural networks. W ł odzis ł aw Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk. Why is this an important issue?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/1.jpg)
Transfer functions: hidden possibilities for better neural networks.
Transfer functions: hidden possibilities for better neural networks.
Włodzisław Duch and Norbert Jankowski Department of Computer Methods,
Nicholas Copernicus University, Torun, Poland.
http://www.phys.uni.torun.pl/kmk
![Page 2: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/2.jpg)
Why is this an important issue?Why is this an important issue?
MLPs are universal approximators - no need for other TF?
Wrong bias => poor results, complex networks.
Example of a 2-class problems:
Class 1 inside the sphere, Class 2 outside.
MLP: at least N +1 hyperplanes, O(N2) parameters.
RBF: 1 Gaussian, O(N) parameters.
Class 1 in the corner defined by (1,1 ... 1) hyperplane, C2 outside.
MLP: 1 hyperplane, O(N) parameters.
RBF: many Gaussians, O(N2) parameters, poor approximation.
![Page 3: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/3.jpg)
InspirationsInspirationsLogical rule: IF x1>0 & x2>0 THEN Class1 Else Class2
is not properly represented neither by MLP nor RBF!
Result: decision trees and logical rules perform on some datasets (cf. hypothyroid) significantly better than MLPs!
Speed of learning and network complexity depends on TF. Fast learning requires flexible „brain modules” - TF.
• Biological inspirations: sigmoidal neurons are crude approximation at the basic level of neural tissue.
• Interesting brain functions are done by interacting minicolumns, implementing complex functions.
• Modular networks: networks of networks.
• First step beyond single neurons: transfer functions
providing flexible decision borders.
![Page 4: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/4.jpg)
Transfer functionsTransfer functionsTransfer function f(I(X)): vector activation I(X) and scalar output o(I).
1. Fan-in, scalar product activation W.X, hyperplanes.
1
2 2 2
max
;
1,
2
N
i i ij jj
I W X
I D
X W W X
W X W X W X W X
3. Mixed activation functions
( ; )D X R X R
2. Distance functions as activations, for example Gaussian functions:
( ; , )A X W R W X X R
![Page 5: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/5.jpg)
Taxonomy - activation f.Taxonomy - activation f.
![Page 6: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/6.jpg)
Taxonomy - output f.Taxonomy - output f.
![Page 7: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/7.jpg)
Taxonomy - TFTaxonomy - TF
![Page 8: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/8.jpg)
TF in Neural NetworksTF in Neural NetworksChoices:1. Homogenous NN: select best TF, try several types
Ex: RBF networks; SVM kernels (today 50=>80% change).2. Heterogenous NN: one network, several types of TF
Ex: Adaptive Subspace SOM (Kohonen 1995), linear subspaces.Projections on a space of basis functions.
3. Input enhancement: adding fi(X) to achieve separability. Ex: functional link networks (Pao 1989), tensor products of inputs; D-MLP model.
Heterogenous:
1. Start from large network with different TF, use regularization to prune
2. Construct network adding nodes selected from a pool of candidates
3. Use very flexible TF, force them to specialize.
![Page 9: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/9.jpg)
Most flexible TFsMost flexible TFs
Conical functions: mixed activations
( ; , , , ) ( - ; ) ( , ) CA I D X W R X R W X R
Lorentzian: mixed activations
1 2 2
1; , , ,
1 ( ; ) ( ; )GLCI D
X W RX W X R
1
; , , , , 1i i i i
Ns b s b
i i i i i ii
SBi e X D e e X D e
X D b s α β
Bicentral - separable functions
![Page 10: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/10.jpg)
Bicentral + rotationsBicentral + rotations
6N parameters, most general.
1
' '
1
; , ' '
; , ', ; , ; ,N
K N N i i ii
L X D D X D X D
C L D D L X D D
X D D W W X
Box in N-1 dim x rotated window.
'
1
; , , , ', ,
1i i i i
Ns b s b
i i i i i ii
SBi
e X D e e X D e
X D b s s α β
Rotation matrix with band structure makes 2x2 rotations.
1
'
1
1
; , ', , ,
; dla 1.. 1
N
i i i i i ii
ii i ii i
SB R D R D
R s R i N
X D D α β R X X
![Page 11: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/11.jpg)
Some properties of TFsSome properties of TFs
For logistic functions:
Renormalization of a Gaussian gives logistic function
1
1
x b x b x b x b
b bb b
2
1
; ,; ,
; , ; ,
1
1 exp 4 /
gR
g g
N
i i ii
GG
G G
D b X
X D bX D b
X D b X D b
W Xwhere:Wi =4Di /bi
2
![Page 12: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/12.jpg)
Example of input transformationExample of input transformationMinkovsky’s distance function:
Sigmoidal activation changed to:
0 , ;d D W X W X
1 1
, ; ,N N
i i i i ii i
D d W X W X
W X
, , ,0;e eX D X const
X X
Adding a single input renormalizing the vector:
![Page 13: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/13.jpg)
ConclusionsConclusions
Radial and sigmoidal functions are not the only choice.Radial and sigmoidal functions are not the only choice.
StatLog report: large differences of RBF and MLP on many datasets.
Better learning cannot repair wrong bias of the model.
Systematic investigation and taxonomy of TF is worthwhile.
Networks should select/optimize their functions.
StatLog report: large differences of RBF and MLP on many datasets.
Better learning cannot repair wrong bias of the model.
Systematic investigation and taxonomy of TF is worthwhile.
Networks should select/optimize their functions.
Open questions:
Optimal balance between complex nodes/interactions (weights)?How to train heterogeneous networks? How to optimize nodes in a constructive algorithms?Hierarchical, modular networks: nodes that are networks themselves.
Open questions:
Optimal balance between complex nodes/interactions (weights)?How to train heterogeneous networks? How to optimize nodes in a constructive algorithms?Hierarchical, modular networks: nodes that are networks themselves.
![Page 14: Transfer functions: hidden possibilities for better neural networks](https://reader036.vdocuments.us/reader036/viewer/2022082818/56813324550346895d9a07f9/html5/thumbnails/14.jpg)
The End ?
Perhaps the beginning ...