chris satchwell, mandy bradley technical forecasts ltd commercial house, 19 station road
DESCRIPTION
Forecasting values of commercial and residential property using non-linear mathematical and statistical techniques. Chris Satchwell, Mandy Bradley Technical Forecasts Ltd Commercial House, 19 Station Road Bognor Regis PO21 1QD Phone / fax 01243-861110 / 861113 http://www.tfl.biz. - PowerPoint PPT PresentationTRANSCRIPT
Forecasting values of commercial and residential property
using non-linear mathematical and statistical techniques
Chris Satchwell, Mandy BradleyTechnical Forecasts Ltd
Commercial House, 19 Station Road
Bognor Regis PO21 1QD
Phone / fax 01243-861110 / 861113
http://www.tfl.biz
The need for property forecasts
• Quantification of market direction gives best appraisal for most profitable asset
• Facilitates planning for acquisition or disposal further in advance of market turning points
• Saves valuable research time by presenting forecasts in easily analysable format
• Mass forecasting capability aids portfolio analysis by providing most current forecasts for all portfolio and candidate properties
Information in historical data
“Historical market performance is not a reliable indicator of future market behaviour”
Yet .. would anyone present disagree with the following statement:
“Historical data contains some information about future movement”
Relationships in other data series
• Leading indicators• Sought after, and implemented for many years by
property professionals
• Recognise that traditionally strong leading indicators have shown diminished correlations and subsequent ability to provide information over the past few years.
• Parallel series• Similar to leading indicators, and have recognised
mathematical relationships to the Target series
• Improve the forecastability of a Target series
Overview Of Twin-Series Forecasting
Window Of 'Target' TimeSeries Values
Window of 'Associated'Time Series Values
Choice of 'Associated' Series to AidForecast of Target Series
Multiple Models of RelationshipsbetweenTarget & Associated Series
Forecasts Derived From Models
Measuring associations between time series• Form a 1 D histogram of ‘target’ time series values• The data’s disorder (or entropy) is found by summing –
p.ln(p), where ‘p’ is the probability of a ‘bin’ of the histogram.
• Call this E1.
• Form a 2 D histogram of values of both target and associated series and sum –p.ln(p) for each 2D ‘bin’. Call this E2.
• The REDUCTION in disorder from including the new series is (E1 – E2).
• The greater the reduction in disorder (Mutual Information), the stronger the association between the series.
Design issues
Requirements:
1. Capable of generating thousands of forecasts every month from committees of non-linear models
2. Robust
3. As accurate as possible
4. Models need to be complexity-optimised to avoid instabilities
5. Ideally, models should be uncorrelated
6. Capable of getting the best results possible from limited data
Possible network solutions1. MLP’s – accurate but quirky, do not lend themselves to
automation.
2. RBF’s – less forgiving of irrelevant inputs, but can be made accurate and robust. These are used. Unsupervised clustering gives centres of RBF’s.
Possible Complexity Optimisation Solutions1. Cross Validation – difficult with little data & difficult to
automate interpretation of Error v. Complexity graphs
2. MAP – requires multi-dimensional integration capabilities that limit the dimensionality of the problem to which it can be applied. Not robust.
3. Evidence Approximation – issues on robustness, but this was used.
What they don’t tell you about the Evidence Approximation
• Implicit assumption is that a multi-dimensional surface can be fitted through data, such that the likelihood of any ‘noise’ data decays as you move away from the surface.
• It is a technique for multi-dimensional signal extraction from noisy data.
• If the data does not comply with these assumptions, or is pure noise, the method may fail.
• It finds an appropriate amount of regularisation to generalise an over-complicated model, and will not work if the initial model is too simple.
• It is easier to apply to RBF’s than MLP’s
• If it fails, it is probably an indication that the data cannot be sensibly modelled, which is useful to know.
What they do tell you about the Evidence Approximation
• Of all the possible weight-dependent models that could describe the data, a set of weights exist (w MP) that produce a unique maximum for the probability of the model correctly representing the data.
• As the values of weights diverge from w MP the probability of the model being correct decays.
• The two previous points imply that we expect the variation of weights with the probability of a model being correct, to be capable of being expressed by a multi-dimensional Gaussian surface.
• When the maths are worked through (Bishop Ch. 10) this is equivalent to adding a ‘sum of weights squared’ term to a least squares error function.
• A minimum of this function gives the weights than maximize the chances of the model being correct.
Basic formulae
y = j wj.j(x) (1) Formula for RBF
ED = 0.5 n(tn - j wj.j(xn))
2 (2) L/ S Err. Func.
EW = 0.5j wj
2 (3) Wt. Comp. of Err. Func.
Minw [.EW + .ED] (4) Err. Func. to be
minimised
Solution formulae
[n i(xn). j(xn)]{w} = { n i(xn).tn} (5) Soln. to (1)
[H] {w} = { n i(xn).tn} (6) ..
(7) Eigenvalues of [H]
W (8) Number of Weights
2 EW
MP = W - i / (i + ) (9) Condition for most
probable model
Solutions• The end product of this process (3,000,000 RBF’s/ month) is
either (1) a failure to achieve an answer or (2) a set of weights for the most probable RBF model fitting the data.
• In the event of a failure, it is possible to reduce the width of receptive fields or increase the number of basis functions to try to achieve success. In extremes, success is achieved with ‘spiky’ basis basis functions that probably offer a worse solution that one arrived at by a combination of eye and cross validation, but which is too simple to allow an evidence approximation solution.
• Overall conclusion is not to assume a solution works better in practice just because it has been derived using Bayesian methods.
Committee issues• Models with uncorrelated errors can be combined to
produce an overall error inversely proportional to their number.
• In practice, most models are correlated.
• We use models with different inputs in an attempt to reduce correlations between model errors.
• We have experimented with the covariance method and quadratic programming ( Minz | zT. C. z | s.t. i zi = 1 & 0 <= zi <= 1), but currently use straight averaging of model outputs for our forecasts.
Forecasting issues• The basis of a forecast is to sense if a relationship exists
(Mutual Information), model it (RBF/ Evidence Approximation), assume it continues into the future, and use it to generate results.
• Where the relationships are strong and consistent, the answers tend to be good. Where they are weak or inconsistent, they may not be so good.
• This means that quality can never be guaranteed, only the ability to see how well we would have performed had we used the method on historic data, to produce a forecast that is capable of being compared with more recent data.
The Forecasting Process
Forecast each of 200 (approx) econometricand Social series one step ahead
Far Enough Ahead?
n
Store Forecasts
y
Forecast each of 30,000 (approx) postcodesector house price series, using econometric
& social forecasts as associated series
Summary of forecasting
1. Determine the relationship between target series (eg Land Registry house price data) and economic and financial indicators (eg employment rates, construction indices, lending rates…)
2. Pick out the most significant series that share information with the target series
3. Forecast the target series alongside each of the parallel series
4. Fuse all this data to produce a single forecast that has the highest probability of replicating future movement
About Land Registry data
• Postcode: EX4 4QJ
• Postcode area – EX• 104 in England & Wales
• Postcode district - EX4• Around 2500 in England & Wales, • Average around 20000 addresses in each
• Postcode sector - EX4 4• Average around 3000 addresses in each• but varying from under 500 to over 8000
DetachedDetached
Semi-detachedSemi-detached
TerracedTerraced
Flats/MaisonettesFlats/Maisonettes
… … at least 3 sales at least 3 sales
per typeper type
per quarterper quarter
Building the data set
LandReg data gives 1 quarter’s average prices to postcode sector level for each property type
… SO just add it to the previous quarter’s data ! ?
• Historic data updates • Missing data• Errors in data• New postcodes / old postcodes• … and so on
What sort of accuracy does this deliver?
Average deviation from actual, Sep00-Sep02
Most forecastable
districts
Most forecastable
sectors
Detached houses -8.8% -7.9%
Semi-detached houses -11.1% -10.5%
Terraced houses -10.6% -9.9%
Flats / Maisonettes -8.7% -8.3%
What sort of accuracy does this deliver?
Within 15% of LandReg actual, Sep00-Sep02
Most forecastable
districts
Most forecastable
sectors
Detached houses 77% 76%
Semi-detached houses 62% 69%
Terraced houses 68% 73%
Flats / Maisonettes 75% 72%
Assessing accuracy
Residential data is sparse, and often highly volatile –
eg Detached houses in London N6:
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
Dec-99 Mar-00 Jun-00 Sep-00 Dec-00 Mar-01 Jun-01 Sep-01 Dec-01 Mar-02 Jun-02 Sep-02 Dec-02
N6 4 N6 6
Accuracy at a point in time?
• Accuracy at one specific period may be misleading as an overall measure
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
N6 4 N6 6 N6 4 ma
Volatility measure,
where
(SD of avge hse price) and
ln(return on hse value)
2_
1
)(1
1uu
ns
n
ii
s
1
lni
ii S
Su
Combined approach to forecastability
• RMS error, where = /
• If >> 1, Model may be too simple for data
• If << 1, Model might be trying to model noise
• …. except where << 1
Historic forecastability classification
• IF ( <0.1) AND ((0.9< <1.1) OR ( <0.05)) THEN = A
• IF ( <0.2) AND ((0.8< <1.25) OR ( <0.08)) THEN = B
• IF ( <0.35) AND ((0.6< <1.5) OR ( <0.12)) THEN = C
• ELSE = U
A (genuine) volatile property series:
… … where next, for next 3 years?where next, for next 3 years?
??
60
70
80
90
100
110
120F
eb-8
7
Feb
-88
Feb
-89
Feb
-90
Feb
-91
Feb
-92
Feb
-93
Feb
-94
Feb
-95
Feb
-96
Feb
-97
Feb
-98
Feb
-99
Same series, with associated series:
Any easier?Any easier?
??
-50
0
50
100
150
200F
eb-8
7
Feb
-88
Feb
-89
Feb
-90
Feb
-91
Feb
-92
Feb
-93
Feb
-94
Feb
-95
Feb
-96
Feb
-97
Feb
-98
Feb
-99
Indexbase '87GDQL
BCIS
FBYN
GMPag
CKYW
FTAP
How TFL managed:
70
75
80
85
90
95
100
Jan-98 Jan-99 Jan-00 Jan-01 Jan-02
Actualindex
ForecastfromMar99
How TFL managed:
70
75
80
85
90
95
100
Jan-98 Jan-99 Jan-00 Jan-01 Jan-02
Actualindex
ForecastfromMar99
Forecastshifted by10
Semi-detached houses in GU2 – forecasts vs actual from Dec99-Dec02
90
95
100
105
110
115
120
125
130
135
Dec-99
Mar-00
Jun-00
Sep-00
Dec-00
Mar-01
Jun-01
Sep-01
Dec-01
Mar-02
Jun-02
Sep-02
Dec-02
GU2 forecast
GU2 act
Terraced houses in UB6 - forecast vs actual, Dec99 to Dec02
90
100
110
120
130
140
150
160
Dec-99
Mar-00
Jun-00
Sep-00
Dec-00
Mar-01
Jun-01
Sep-01
Dec-01
Mar-02
Jun-02
Sep-02
Dec-02
TFL forecast
Actual vals
Central London office rental values forecast vs. actual from March 1999
100
110
120
130
140
150
160
TFL forecast
Actual
forecast from IPD’s RegionalPages seriesforecast from IPD’s RegionalPages series
Uses to date include:
1. Investment decisions involving properties
2. Newspapers wanting content
3. Web sites seeking to increase ‘Stickiness’.
4. Future crime rates (for a police force)
5. Government decisions involving land, acquisition of computer/ office equipment & other sundries that need forecasting..
…Questions??
Chris Satchwell, Mandy BradleyTechnical Forecasts Ltd
Commercial House, 19 Station Road
Bognor Regis PO21 1QD
Phone / fax 01243-861110 / 861113
http://www.tfl.biz
…. Yes, Dr Nabney???