support vector machine with adaptive parameters in financial time series forecasting by l. j. cao...

Support Vector Machine With Adaptive Parameters in Financial Time Series Forecasting

by L. J. Cao and Francis E. H. TayIEEE Transactions On Neural Networks, Vol. 14, No. 6, Nov 2003

Presented by Pooja HegdeCIS 525: Neural Computation

Spring 2004Instructor: Dr Vucetic

Presentation Outline Introduction

Motivation and introduction of a novel approach: SVM

Background SVMs in Regression Estimation

Application of SVMs in financial forecasting Experimental setup and results

Experimental analysis of SVM parameters and results Adaptive Support Vector machines (ASVM)

Experimental setup and results

Conclusions

Introduction

Financial Time Series is one of the most challenging applications of modern

time series forecasting.

Characteristics:

Noisy- unavailability of complete information from past behavior of financial

markets to fully capture dependency between future and past prices.

Non-stationary- distribution of financial time series changes over time.

The learning algorithm needs to incorporate this characteristic: information

given by recent data points is given more weight as compared to distant data

points.

Introduction Back-propagation Neural Networks have been successfully used for modeling

financial series. BP Neural networks are universal function approximators that can map any

non-linear function without any priori assumptions about the properties of the data.

They are more effective in describing dynamics of non-stationary time series due to their unique non-parametric, noise-tolerant and adaptive properties.

Then what’s the problem!! Need for large number of controlling parameters. Difficulty in obtaining a stable solution. Danger of overfitting: Neural network captures not just the useful

information in training data but also unwanted noises, hence this leads to poor generalization.

A Novel Approach: SVMs Support Vector Machines are being used in a number of areas ranging from

pattern recognition to regression estimation.

Reason : Remarkable characteristics of SVMs

Good generalization performance: SVMs implement the Structural Risk Minimization Principle which seeks to minimize the upper bound of the generalization error rather than only minimize the training error.

Absence of local minima: Training SMV is equivalent to solving a linearly constrained quadratic programming problem. Hence the solution of SVMs is unique and globally optimal.

Sparse Representation of solution:In SVM, the solution to the problem only depends on a subset of training data points, called support vectors.

Background Theory of SVMs in Regression Estimation

Given a set of data points (x1,y1), (x2,y2),…,(xl,yl) randomly and independently generated from an unknown function. SVM approximates the function using the following:

The coefficients w and b are estimated by minimizing the regularized risk function.

To estimate w and b the above equation is transformed to the primal function by introducing positive slack variables.

Background Theory of SVMs in Regression Estimation (contd..)

Introducing Lagrange multipliers and exploiting optimality constraints: decision function has following explicit form

are the Lagrange multipliers. They satisfy the equalities and they are obtained by maximizing the dual function which has the

following form:

Feasibility of Applying SVM in Financial Forecasting

Experimental Setup: Data Sets-

The daily closing prices of five real

futures contracts from the Chicago

Mercantile Market are used as datasets.

The original closing price is transformed into a

five-day relative difference in percentage of price (RDP).


Input variables are determined from four

lagged RDP values based on 5-day periods

(RDP-5, RDP-10, RDP-15, RDP-20) and

one transformed closing price(EMA100).

Output variable- RDP+5.

Z-score normalization is used for

normalizing the time series containing

outliers.

Walk-forward testing routine is used to divide

whole dataset into 5 overlapping training-validation-testing sets.


Performance Criteria: NMSE and MAE: measures of deviation between the actual and predicted values. Smaller values of NMSE and MAE indicate better predictor. DS: indication of the correctness of the predicted direction of RDP+5 given in the form of percentages. A larger value of DS suggests a better predictor.

Gaussian Kernel is used as the kernel function of SVM. Use the results on the validation set to choose the optimal kernel parameters (C,ε and δ2) of the SVM.


Benchmarks Standard 3-layer BP neural network with 5 input nodes and 1 output node.

Number of hidden nodes,learning rate & number of epochs is chosen based on the validation set.

Sigmoid transfer function-hidden nodes and Linear transfer function-output node.

Stochastic gradient descent method- train NN.

Regularized RBF Neural Network It minimizes the risk function consisting of the empirical error and

regularized term. Regularized RBF neural network software used is developed by Muller

et al. and can be downloaded from http://www.kernel-machines.org. Centers, variances and output weights are adjusted. Number of hidden nodes and regularization parameter is chosen based

on validation set.

Results

In all future contracts, largest values of NMSE

& MAE are in RBF Neural Network. In CME-SP, CBOT-US and EUREX_BUND,

SVM has smaller NMSE and MAE values but

BP has smaller values for DS . The reverse is true for CBOT-BO & MATIF-CAC40

All values of NMSE are near or larger than 1.0

indicating financial datasets are very noisy. Smallest values of NMSE & MAE occur in SVM,

followed by RBF neural network. In terms of DS, results are comparable among the

3 methods

Results

In CME-SP, CBOT-BO, EUREX-BUND, and MATIF-CAC40, smallest values of NMSE and MAE

are found in SVM followed by RBF neural network. In CBOT-US, BP has smallest NMSE & MAE followed by RBF. Paired t-test: SVM and RBF outperform BP with = 5% significance level for one-tailed test. No

significant difference between SVM and RBF.

Experimental Analysis of Parameters C and δ2

Results

Too small a value of δ2 causes SVM to overfit the training data while too large a

value causes SVM to underfit the training data.

Small value for C will underfit training data. When C is too large, SVM will

overfit the training set – deterioration in generalization performance.

δ2 and C play an important role as far as the generalization performance of the

SVM is concerned.

Experimental Analysis of Parameter ε

NMSE on training & validation set is very stable & relatively unaffected by changes in ε.

Performance of SVM is insensitive to ε.But this result cannot be generalized because

effect of ε on performance depends on input dimension of dataset

Number of support vectors is a decreasing function of ε.Hence a large ε reduces the

number of support vectors without affecting the performance of the SVM.

Support Vector Machine with Adaptive Vectors (ASVM)

Modification of parameter C: Regularized risk function – empirical error + regularized term Increasing value of C increases relative importance of empirical error w.r.t

regularized term.

The behaviors of the weight function can be summarized as follows: When a0 lima 0Ci = C. Hence EASVM = ESVM

When a

When a [0, ] and a increases, the weights for first half of training data points become smaller and those for second half of training data points become larger.

Support Vector Machine with Adaptive Vectors (ASVM)

Modification of parameter ε : To make the solution of SVM sparser, ε adopts following form:

Proposed adaptive places ε more weights on recent training points than the distant ones.

Support vectors are a decreasing function of ε, recent training points will obtain more attention in the representation of solution that the distant points

The behaviors of the weight function can be summarized as follows: When b0 limb 0 ε i = ε. Hence the weights in all training data points = 1.0 When b

When b [0, ] and b increases, the weights for first half of training data points become larger and those for second half of training data points become smaller.

Adaptive Vectors (ASVM) & Weighted BP Neural Network(WBP)

Regularized risk function in ASVM:

Corresponding dual function

Weighted BP Neural Network:

Weight update:

Results of ASVM

ASVM and WBP have smaller NMSE & MAE but larger DS than their corresponding standard methods.

ASVM outperforms SVM with =2.5%

WBF outperforms BP with =10%

ASVM outperforms WBP with =5%

ASVM converges to fewer support vectors

Conclusions SVM: A promising alternative tool to BP neural network for financial time

series forecasting.

Comparable performance between regularized RBF neural network and

SVM.

C and δ2 have a great influence on the performance of SVM. Number of

support vectors can be reduced by using larger ε, resulting in sparse

representation of solution.

ASVM achieves higher generalization performance and uses fewer support

vectors than standard SVM in financial forecasting.

Future work: Investigate techniques to choose optimal values of the free

parameters of ASVM. Explore sophisticated weight functions that closely

follow dynamics of time series and further improve performance of ASVM.

THANK YOU!!!!

support vector machine with adaptive parameters in financial time series forecasting by l. j. cao...

Documents

financial time series

solution of svms

decision function

primal function

nonlinear function

dual function

unknown function

modeling financial series