study of sparse online gaussian process for regression ee645 final project may 2005 eric saint...
Post on 22-Dec-2015
223 views
TRANSCRIPT
Study ofSparse Online Gaussian Process
for Regression
EE645 Final ProjectMay 2005
Eric Saint Georges
A. IntroductionB. OGP
1. Definition of Gaussian Process2. Sparse Online GP algorithm (OGP)
C. Simulation Results1. Comparison with LS SVM on Boston Housing data
set (Batch)2. Time Series Prediction using OGP3. Optical Beam Position Optimization
D. Conclusion
Contents
Introduction
Possible Application of OGP to Optical Free Space Communication
for
Monitoring and Optimization in a noisy environment
Using
Sparse OGP Algorithm developed by Lehel Csato and al.
A. IntroductionB. OGP
1. Definition of Gaussian Process2. Sparse Online GP algorithm (OGP)
C. Simulation Results1. Comparison with LS SVM on Boston Housing data
set (Batch)2. Time Series Prediction using OGP3. Optical Beam Position Optimization
D. Conclusion
Contents
Gaussian Process Definition
Collection of indexed random variables– Mean– Covariance defined by a Kernel function
• function: can be any Positive Semi Definite function• Defines the assumptions on the prior distribution• Wide scope of choices• Popular Kernels are stationary functions: f (x-x’)
– Index can be time or space or anything else
On line GP Process
• Bayesian Process:Prior distribution (GP Process)
+
Likelihood Function
Posterior distribution
(Using Bayes rule)
Solving a Gaussian Process: Given measures n inputs and n measures t i
(with ti = yi + ei) being zero mean and e variance
Prior distribution over y i is given by the covariance matrix K ij = C(xi,xj).
Prior distribution over the measures t i is given by K+ e In
Prediction on function y* for an input x* consists in calculating the mean and variances:
y*(x* )= i C(xi,xj)
and
(x* )=C(x*,x*) –kT(x*)(K + e In )-1 k (x*)
With i = (K + e In )-1 t
Solving the Gaussian Process:
Solving requires inversion of (K + e In ) which is a n x n matrix, n being the number of training inputs.
Memory is in n2 and cpu time in n3.
Sampling from a Gaussian Process:
• Example of Kernel:
a = amplitude
s = scale (smoothness)
n
i i
ii xx
eaK 1
2)'(
Sampling from a GP: before Training
-20 -15 -10 -5 0 5 10 15 20-10
-8
-6
-4
-2
0
2
4
6
8
10Drawing Samples with Scale = 10
SampleMean+Standard Deviation-Standard Deviation
Sampling from a GP: before Training
0 20 40 60 80 100-10
-8
-6
-4
-2
0
2
4
6
8
10Drawing Samples with Scale = 1
SampleMean+Standard Deviation-Standard Deviation
0 20 40 60 80 100-10
-8
-6
-4
-2
0
2
4
6
8
10Drawing Samples with Scale = 100
SampleMean+Standard Deviation-Standard Deviation
Effect of Scale
Small Scale=1 Large Scale =100
Sampling from a GP: before Training
0 20 40 60 80 100-10
-8
-6
-4
-2
0
2
4
6
8
10Drawing Samples with Scale = 1
SampleMean+Standard Deviation-Standard Deviation
0 20 40 60 80 100-10
-8
-6
-4
-2
0
2
4
6
8
10Drawing Samples with Scale = 100
SampleMean+Standard Deviation-Standard Deviation
Effect of Scale
Small Scale=1 Large Scale =100
Sampling from a GP: After Training
-20 -15 -10 -5 0 5 10 15 20-3
-2
-1
0
1
2
3
4
5
6
-20 -15 -10 -5 0 5 10 15 20-3
-2
-1
0
1
2
3
4
5
6Drawing Samples with Scale = 50
SampleMean+Standard Deviation-Standard Deviation
After 3 Training samples
Sampling from a GP: After Training
-20 -15 -10 -5 0 5 10 15 20-3
-2
-1
0
1
2
3
4
5
6
-20 -15 -10 -5 0 5 10 15 20-3
-2
-1
0
1
2
3
4
5
6Drawing Samples with Scale = 50
SampleMean+Standard Deviation-Standard Deviation
After 10 Training samples
Online Gaussian Process: Issues
Two Major Issues with the GP process;
1. Data Set size is limited by Memory and CPU
2. Posterior distribution is usually not Gaussian
Algorithm developed by Csato and al.
Sparse Online Gaussian Algorithm
Sparsity createdusing limited number
of SVs
GaussianApproximation
Posterior DistributionNot usually Gaussian
Data Set size limitedby Memory and CPU
Matlab Software available on the Web
SOGP Process defined by:– Kernel Parameters m + 2 Vector for RBF Kernel– Support Vectors: d x 1 Vector (indexes)– GP Parameters:
: d x 1 Vector• K: d x n Matrix
• m dimension of input space• d number of support vectors
Sparse Online Gaussian Algorithm
A. IntroductionB. OGP
1. Definition of Gaussian Process2. Sparse Online GP algorithm (SOGP)
C. Simulation Results1. Comparison with LS SVM on Boston Housing
data set (Batch)2. Time Series Prediction using OGP3. Optical Beam Position Optimization
D. Conclusion
LS SVM on Boston Housing Data Set
RBF KernelC=10, =4304 training samples averaged on 10 Random draws
MeanAverage MSE on Training: 3 3 3 3.0Average MSE on Test: 7 7 6 6.5Standard Deviation on Training 0 0 1 0.4Standard Deviation on Test 2 2 2 1.6
Average cpu = 3 sec / run
•Kernel :
•Initial Hyper-parameters: and (i=1 to 13 for BH)
•Number of Hyper-parameter optimization iterations: tried
between 3 and 6
•Max number of Support Vectors: Variable
OGP on Boston Housing Data Set
n
i i
ii xx
eaK 1
2)'(
a i
6 Iterations, MaxBV between 10 and 250
OGP on Boston Housing Data Set
0 50 100 1500
5
10
15GP Regression on Boston Housing Data Set
MS
E
Max number of Support Vectors
maxHyp = 3
MSE av eraged ov er 5 draws. Hy per Parameters updated 3 times. Max numbers of Support Vectors = 10, 20, 50, 100, 150, Number of Training samples = 304 Total Elapsed Time = 2733 sec. Results sav ed in Result_5120_4.mat net Structure sav ed in net_5120_4.mat Figure sav ed in Fig_5120_4.jpg
5120_4 30-Apr-2005
Train
Test
3 Iterations, MaxBV between 10 and 150
OGP on Boston Housing Data Set
4 Iterations, MaxBV between 10 and 150
0 50 100 1500
5
10
15GP Regression on Boston Housing Data Set
MS
E
Max number of Support Vectors
maxHyp = 4
MSE av eraged ov er 5 draws. Hy per Parameters updated 4 times. Max numbers of Support Vectors = 10, 20, 50, 100, 150, Number of Training samples = 304 Total Elapsed Time = 3694 sec. Results sav ed in Result_5120_5.mat net Structure sav ed in net_5120_5.mat Figure sav ed in Fig_5120_5.jpg
5120_5 30-Apr-2005
Train
Test
OGP on Boston Housing Data Set
0 50 100 1500
5
10
15GP Regression on Boston Housing Data Set
MS
E
Max number of Support Vectors
maxHyp = 6
MSE av eraged ov er 5 draws. Hy per Parameters updated 6 times. Max numbers of Support Vectors = 10, 20, 50, 100, 150, Number of Training samples = 304 Total Elapsed Time = 5785 sec. Results sav ed in Result_5120_6.mat net Structure sav ed in net_5120_6.mat Figure sav ed in Fig_5120_6.jpg
5120_6 30-Apr-2005
Train
Test
6 Iterations, MaxBV between 10 and 150
OGP on Boston Housing Data Set
Cpu Time
0 5 10 15 20 250
100
200
300
400
500
600Processing Time
Maximum Number of SVs
Pro
cess
ing
Tim
e (S
ec)s
MaxHyp=3
MaxHyp=4MaxHyp=6
0 50 100 1500.4
0.5
0.6
0.7
0.8
0.9
1
1.1Processing Time /maxBV/maxHyp
Maximum Number of SVs
Pro
cess
ing
Tim
e (S
ec)s
MaxHyp=3
MaxHyp=4MaxHyp=6
0 50 100 150
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
Number of Support Vectors
Alpha*(Beta+SVs2)./SVs
a*(b+SVs2)/SVsas a function of SVs
OGP on Boston Housing Data Set
10 15 20 25 30 35 40 45 50 55 600
5
10
15GP Regression on Boston Housing Data Set
MS
E
Max number of Support Vectors
Train
Test
MSE av eraged ov er 10 draws. Hy per Parameters updated 4 times. Max numbers of Support Vectors = 10, 20, 30, 40, 50, 60, Number of Training samples = 304 Total Elapsed Time = 5749 sec. Results sav ed in Result_5121_1.mat net Structure sav ed in net_5121_1.mat Figure sav ed in Fig_5121_1.jpg
5121_1 01-May-2005
Run with 4 Iterations, MaxBV between 10 and 60
OGP on Boston Housing Data Set
25 30 35 40 452
4
6
8
10GP Regression on Boston Housing Data Set
MS
E
Max number of Support Vectors
Train
Test
MSE av eraged ov er 50 draws. Hy per Parameters updated 4 times. Max numbers of Support Vectors = 30, 40, Number of Training samples = 304 Total Elapsed Time = 8335 sec. Results sav ed in Result_5122_1.mat net Structure sav ed in net_5122_1.mat Figure sav ed in Fig_5122_1.jpg
5122_1 02-May-2005
Final Run with 4 Iterations, MaxBV 30 and 40Average over 50 random draws
Max Number of SVs 30 40Average MSE on Training: 3.80 3.20Average MSE on Test: 7.10 6.90Standard Deviation on Training 0.23 0.24Standard Deviation on Test 1.20 1.10
OGP on Boston Housing Data Set
MSE not as good as LS SVM (6.9 versus to 6.5)But Standard deviation better than LS SVM (1.1 versus 1.6).
Cpu Time much longer (90sec versus 3 sec per run)But increases slower with number of samples than LS SVM.Might do better with large data sets.
Conclusion
OGP on Boston Housing Data Set
TSP (Time Series Prediction)
0 1000 2000 3000 4000 5000-600
-400
-200
0
200
400
600
800
Time
TSP Data
Training DataPrediction Data
OGP on TSP
RUN = 10 Kpar(1) = 0.0100, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_18 08-May-2005
Training DataTest DataGP EstimationOPG Prediction
0 200 400 600 800 1000-600
-400
-200
0
200
400
Initial kpar(1) = 1.00e-002Final kpar(1) = 1.30e-003MSE on Prediction = 2489.7.
980 Training Samples.
OGP on TSP: Initial Runs
RUN = 10 Kpar(1) = 0.0100, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_13 08-May-2005
800 850 900 950 1000-400
-300
-200
-100
0
100
200
Initial kpar(1) = 1.00e-002
Final kpar(1) = 1.61e-002
MSE on Prediction = 1400.2.
Training DataTest DataGP EstimationOPG Prediction
OGP on TSP: Initial Runs
RUN = 10 Kpar(1) = 0.0100, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_15 08-May-2005
700 750 800 850 900 950 1000-400
-300
-200
-100
0
100
200
Initial kpar(1) = 1.00e-002
Final kpar(1) = 2.43e-003
MSE on Prediction = 91.1.
Training DataTest DataGP EstimationOPG Prediction
OGP on TSP: Initial Runs
RUN = 10 281 Training Samples. Kpar(1) = 0.0100, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_33 08-May-2005
700 750 800 850 900 950 1000-400
-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-002
Final kpar(1) = 1.46e-002
MSE on Prediction = 1131.6.
281 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
RUN = 10 Kpar(1) = 0.0100, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_24 08-May-2005
700 750 800 850 900 950 1000-400
-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-002
Final kpar(1) = 2.45e-003
MSE on Prediction = 95.1.
281 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
OGP on TSP: Local Minimum
For both runs: Initial kpar = 1e-2
Final kpar = 1.42e-2MSE = 1132
Final kpar = 2.45e-3MSE = 95
OGP on TSP: Impact of Over-fitting
940 960 980 1000 1020 1040
-100
0
100
200
RUN = 7
Kpar(1) = 0.3000, kpar(2) = 2000
Overlap between section = 30 Training samples.
Max numbers of Support Vectors = 50,
OGP on TSP: Impact of Number of Samples on Prediction
OGP on TSP: Impact of Number of Samples on Prediction
cpu=6sec
RUN = 11 81 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_46 08-May-2005
900 920 940 960 980 1000-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-003
Final kpar(1) = 2.42e-003
MSE on Prediction = 124.6.
81 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
OGP on TSP: Impact of Number of Samples on Prediction
cpu=16 sec
RUN = 11 181 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_52 08-May-2005
800 850 900 950 1000-400
-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-003
Final kpar(1) = 2.25e-003
MSE on Prediction = 91.7.
181 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
RUN = 11 281 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_42 08-May-2005
700 750 800 850 900 950 1000-400
-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-003
Final kpar(1) = 2.21e-003
MSE on Prediction = 85.9.
281 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
OGP on TSP: Impact of Number of Samples on Prediction
cpu=27sec
OGP on TSP: Impact of Number of Samples on Prediction
RUN = 11 381 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_44 08-May-2005
600 650 700 750 800 850 900 950 1000-400
-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-003
Final kpar(1) = 2.66e-003
MSE on Prediction = 104.8.
381 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
cpu=45sec
OGP on TSP: Impact of Number of Samples on Prediction
cpu=109sec
RUN = 11 481 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_45 08-May-2005
500 600 700 800 900 1000-400
-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-003
Final kpar(1) = 3.66e-003
MSE on Prediction = 99.1.
481 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
OGP on TSP: Impact of Number of Samples on Prediction
cpu=233sec
RUN = 11 581 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_51 08-May-2005
400 500 600 700 800 900 1000-600
-400
-200
0
200Initial kpar(1) = 1.00e-003Final kpar(1) = 3.58e-003MSE on Prediction = 632.6.
581 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
OGP on TSP: Impact of Number SVs on Prediction
OGP on TSP: Impact of Number SVs on Prediction
cpu=19sec
RUN = 11 181 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 10,
5128_54 08-May-2005
800 850 900 950 1000-400
-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-003
Final kpar(1) = 2.23e-003
MSE on Prediction = 101.2.
181 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
OGP on TSP: Impact of Number SVs on Prediction
cpu=16sec
RUN = 11 181 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 50,
5128_52 08-May-2005
800 850 900 950 1000-400
-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-003
Final kpar(1) = 2.25e-003
MSE on Prediction = 91.7.
181 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
OGP on TSP: Impact of Number SVs on Prediction
cpu=16sec
RUN = 11 181 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 0 Training samples. Max numbers of Support Vectors = 100,
5128_53 08-May-2005
800 850 900 950 1000-400
-300
-200
-100
0
100
200Initial kpar(1) = 1.00e-003
Final kpar(1) = 2.24e-003
MSE on Prediction = 88.9.
181 Training Samples.
Training DataTest DataGP EstimationOPG PredictionSupport Vectors
OGP on TSP: Final Runs
RUN = 11 200 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_69 08-May-2005
RUN = 11 230 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_70 08-May-2005
RUN = 11 230 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_71 08-May-2005
RUN = 11 230 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_72 08-May-2005
RUN = 11 210 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_73 08-May-2005
RUN = 11 200 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_74 08-May-2005
RUN = 11 230 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_75 08-May-2005
RUN = 11 230 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_76 08-May-2005
RUN = 11 230 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_77 08-May-2005
RUN = 11 210 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_78 08-May-2005
RUN = 11 200 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_79 08-May-2005
RUN = 11 230 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_80 08-May-2005
0 500 1000 1500 2000 2500-600
-400
-200
0
200
400
600
800
RUN = 11 230 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_81 08-May-2005
Running 200 Sample at a time, with 30 sample overlap
OGP on TSP: Final Runs
1880 1900 1920 1940 1960 1980 2000 2020 2040 2060 2080
0
200
400
600
RUN = 6
Kpar(1) = 1.0000, kpar(2) = 2200
Overlap between section = 0 Training samples.
OGP on TSP: Why an overlap?
RUN = 11 230 Training Samples. Kpar(1) = 0.0010, kpar(2) = 2000 Overlap between section = 30 Training samples. Max numbers of Support Vectors = 50,
5128_100 08-May-2005
2000 2500 3000 3500 4000 4500 5000-300
-200
-100
0
100
200
300
400
500Initial kpar(1) = 1.00e-003
Final kpar(1) = 9.24e-004
Does not always behaves!...
OGP on TSP: Final Runs
Difficult to find the right set of parameters
•Initial Kernel Parameter
•Number of Support Vectors
•Number of Training Samples per run
OGP on TSP: Conclusion
Beam Position Optimization
Gaussian Beam
-200-100
0100
200-200
0
200
-30
-25
-20
-15
-10
-5
0
y
x
Inte
nsi
ty
With Small Noise
Gaussian Beam
-200-100
0100
200-200
0
200
-30
-25
-20
-15
-10
-5
0
y
x
Inte
nsi
ty
With Noise
Gaussian Beam
-200-100
0100
200-200
0
200
-30
-25
-20
-15
-10
-5
0
y
x
Inte
nsi
ty
With Noise
Gaussian Beam
-200-100
0100
200-200
0
200
-30
-25
-20
-15
-10
-5
0
y
x
Inte
nsi
ty
With Noise
Gaussian Beam
-200-100
0100
200-200
0
200
-30
-25
-20
-15
-10
-5
0
y
x
Inte
nsi
ty
With Noise
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
x
y
Inte
nsi
ty
Gaussian Beam Position Optimization
Sampling the beam at a given position, and measuring Power
Objective: Find the top of the beam
• Idea:
– With a few initial samples, use the OGP to get an estimate of the beam profile and position
– Move toward the max of the estimate– Add this new sample to the training set– Iterate
Beam Position Optimization
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Beam Position Optimization
-200
0
200
-200-100
0100
2000
0.2
0.4
0.6
0.8
1
-200
-100
0
100
200
-200-100
0100
200-30
-20
-10
0
xy
Inte
nsi
ty
Works faster than current algorithm (Finds the top with less
steps)
Does not work well if no noise.
Can be improved
OGP for Beam Optimization: Conclusion
• Specific Kernel: s1 = s2 (Beam is symmetric in x, y)
• Use the known Beam divergence to set the initial Kernel
Parameters
• Optimize choice of sample
– Going directly to the estimated top might not be the best
(Because it does not help to improve the estimate)
– Improve robustness by minimizing probability to sample at lower power
OGP for Beam Optimization: Possible Improvements
A. IntroductionB. OGP
1. Definition of Gaussian Process2. Sparse Online GP algorithm (OGP)
C. Simulation Results1. Comparison with LS SVM on Boston Housing data
set (Batch)2. Time Series Prediction using OGP3. Optical Beam Position Optimization
D. Conclusion
• OGP is an interesting tool
• Complex software
• Many tunings to insure stability and convergence
• Not easy to use
• Next Steps:
More comparison with online LS SVM
– Performance
– Cpu Time
Conclusion
References
• [1] Gaussian Processes, C.K. Williams, March1, 2002
• [2] Efficient Implementation of Gaussian Processes, Mark Gibbs, David J.C. MacKay, May 28, 1997
• [3] Sparse Online Gaussian Processes, Lehel Csató, Manfred Opper, October 9, 2002
• [4] Neural Networks for pattern Recognition, Christopher M. Bishop
• [5] Time Series Competition Data downloaded from http://www.esat.kuleuven.ac.be/sista/workshop/competition.html
• [6] Castó OGP toolbox for Matlab and Demo Program Tutorial from http://www.kyb.tuebingen.mpg.de/bs/people/csatol/ogp/