support vector machine with adaptive parameters in financial time series forecasting by l. j. cao...
TRANSCRIPT
Support Vector Machine With Adaptive Parameters in Financial Time Series Forecasting
by L. J. Cao and Francis E. H. TayIEEE Transactions On Neural Networks, Vol. 14, No. 6, Nov 2003
Presented by Pooja HegdeCIS 525: Neural Computation
Spring 2004Instructor: Dr Vucetic
Presentation Outline Introduction
Motivation and introduction of a novel approach: SVM
Background SVMs in Regression Estimation
Application of SVMs in financial forecasting Experimental setup and results
Experimental analysis of SVM parameters and results Adaptive Support Vector machines (ASVM)
Experimental setup and results
Conclusions
Introduction
Financial Time Series is one of the most challenging applications of modern
time series forecasting.
Characteristics:
Noisy- unavailability of complete information from past behavior of financial
markets to fully capture dependency between future and past prices.
Non-stationary- distribution of financial time series changes over time.
The learning algorithm needs to incorporate this characteristic: information
given by recent data points is given more weight as compared to distant data
points.
Introduction Back-propagation Neural Networks have been successfully used for modeling
financial series. BP Neural networks are universal function approximators that can map any
non-linear function without any priori assumptions about the properties of the data.
They are more effective in describing dynamics of non-stationary time series due to their unique non-parametric, noise-tolerant and adaptive properties.
Then what’s the problem!! Need for large number of controlling parameters. Difficulty in obtaining a stable solution. Danger of overfitting: Neural network captures not just the useful
information in training data but also unwanted noises, hence this leads to poor generalization.
A Novel Approach: SVMs Support Vector Machines are being used in a number of areas ranging from
pattern recognition to regression estimation.
Reason : Remarkable characteristics of SVMs
Good generalization performance: SVMs implement the Structural Risk Minimization Principle which seeks to minimize the upper bound of the generalization error rather than only minimize the training error.
Absence of local minima: Training SMV is equivalent to solving a linearly constrained quadratic programming problem. Hence the solution of SVMs is unique and globally optimal.
Sparse Representation of solution:In SVM, the solution to the problem only depends on a subset of training data points, called support vectors.
Background Theory of SVMs in Regression Estimation
Given a set of data points (x1,y1), (x2,y2),…,(xl,yl) randomly and independently generated from an unknown function. SVM approximates the function using the following:
The coefficients w and b are estimated by minimizing the regularized risk function.
To estimate w and b the above equation is transformed to the primal function by introducing positive slack variables.
Background Theory of SVMs in Regression Estimation (contd..)
Introducing Lagrange multipliers and exploiting optimality constraints: decision function has following explicit form
are the Lagrange multipliers. They satisfy the equalities and they are obtained by maximizing the dual function which has the
following form:
Feasibility of Applying SVM in Financial Forecasting
Experimental Setup: Data Sets-
The daily closing prices of five real
futures contracts from the Chicago
Mercantile Market are used as datasets.
The original closing price is transformed into a
five-day relative difference in percentage of price (RDP).
Feasibility of Applying SVM in Financial Forecasting
Input variables are determined from four
lagged RDP values based on 5-day periods
(RDP-5, RDP-10, RDP-15, RDP-20) and
one transformed closing price(EMA100).
Output variable- RDP+5.
Z-score normalization is used for
normalizing the time series containing
outliers.
Walk-forward testing routine is used to divide
whole dataset into 5 overlapping training-validation-testing sets.
Feasibility of Applying SVM in Financial Forecasting
Performance Criteria: NMSE and MAE: measures of deviation between the actual and predicted values. Smaller values of NMSE and MAE indicate better predictor. DS: indication of the correctness of the predicted direction of RDP+5 given in the form of percentages. A larger value of DS suggests a better predictor.
Gaussian Kernel is used as the kernel function of SVM. Use the results on the validation set to choose the optimal kernel parameters (C,ε and δ2) of the SVM.
Feasibility of Applying SVM in Financial Forecasting
Benchmarks Standard 3-layer BP neural network with 5 input nodes and 1 output node.
Number of hidden nodes,learning rate & number of epochs is chosen based on the validation set.
Sigmoid transfer function-hidden nodes and Linear transfer function-output node.
Stochastic gradient descent method- train NN.
Regularized RBF Neural Network It minimizes the risk function consisting of the empirical error and
regularized term. Regularized RBF neural network software used is developed by Muller
et al. and can be downloaded from http://www.kernel-machines.org. Centers, variances and output weights are adjusted. Number of hidden nodes and regularization parameter is chosen based
on validation set.
Results
In all future contracts, largest values of NMSE
& MAE are in RBF Neural Network. In CME-SP, CBOT-US and EUREX_BUND,
SVM has smaller NMSE and MAE values but
BP has smaller values for DS . The reverse is true for CBOT-BO & MATIF-CAC40
All values of NMSE are near or larger than 1.0
indicating financial datasets are very noisy. Smallest values of NMSE & MAE occur in SVM,
followed by RBF neural network. In terms of DS, results are comparable among the
3 methods
Results
In CME-SP, CBOT-BO, EUREX-BUND, and MATIF-CAC40, smallest values of NMSE and MAE
are found in SVM followed by RBF neural network. In CBOT-US, BP has smallest NMSE & MAE followed by RBF. Paired t-test: SVM and RBF outperform BP with = 5% significance level for one-tailed test. No
significant difference between SVM and RBF.
Experimental Analysis of Parameters C and δ2
Results
Too small a value of δ2 causes SVM to overfit the training data while too large a
value causes SVM to underfit the training data.
Small value for C will underfit training data. When C is too large, SVM will
overfit the training set – deterioration in generalization performance.
δ2 and C play an important role as far as the generalization performance of the
SVM is concerned.
Experimental Analysis of Parameter ε
NMSE on training & validation set is very stable & relatively unaffected by changes in ε.
Performance of SVM is insensitive to ε.But this result cannot be generalized because
effect of ε on performance depends on input dimension of dataset
Number of support vectors is a decreasing function of ε.Hence a large ε reduces the
number of support vectors without affecting the performance of the SVM.
Support Vector Machine with Adaptive Vectors (ASVM)
Modification of parameter C: Regularized risk function – empirical error + regularized term Increasing value of C increases relative importance of empirical error w.r.t
regularized term.
The behaviors of the weight function can be summarized as follows: When a0 lima 0Ci = C. Hence EASVM = ESVM
When a
When a [0, ] and a increases, the weights for first half of training data points become smaller and those for second half of training data points become larger.
Support Vector Machine with Adaptive Vectors (ASVM)
Modification of parameter ε : To make the solution of SVM sparser, ε adopts following form:
Proposed adaptive places ε more weights on recent training points than the distant ones.
Support vectors are a decreasing function of ε, recent training points will obtain more attention in the representation of solution that the distant points
The behaviors of the weight function can be summarized as follows: When b0 limb 0 ε i = ε. Hence the weights in all training data points = 1.0 When b
When b [0, ] and b increases, the weights for first half of training data points become larger and those for second half of training data points become smaller.
Adaptive Vectors (ASVM) & Weighted BP Neural Network(WBP)
Regularized risk function in ASVM:
Corresponding dual function
Weighted BP Neural Network:
Weight update:
Results of ASVM
ASVM and WBP have smaller NMSE & MAE but larger DS than their corresponding standard methods.
ASVM outperforms SVM with =2.5%
WBF outperforms BP with =10%
ASVM outperforms WBP with =5%
ASVM converges to fewer support vectors
Conclusions SVM: A promising alternative tool to BP neural network for financial time
series forecasting.
Comparable performance between regularized RBF neural network and
SVM.
C and δ2 have a great influence on the performance of SVM. Number of
support vectors can be reduced by using larger ε, resulting in sparse
representation of solution.
ASVM achieves higher generalization performance and uses fewer support
vectors than standard SVM in financial forecasting.
Future work: Investigate techniques to choose optimal values of the free
parameters of ASVM. Explore sophisticated weight functions that closely
follow dynamics of time series and further improve performance of ASVM.
THANK YOU!!!!