Can we reliably forecast individual 3G usage data?
An analysis using mathematical simulation of time series algorithms
Cosmo Zheng
Background
• Fluctuations in daily demand for bandwidth make ordinary usage pricing inefficient
• Solution: Time-dependent pricing to persuade users to defer usage
http://scenic.princeton.edu/tube/overview.html
Our Problem
• Users must be informed of expected future prices, to assess the costs of deferring usage
• We need a reliable way to predict future usage based on past data
http://scenic.princeton.edu/tube/technology.html
The Algorithms
• Nonlinear regression – generate a fitted function of the form D + A*sin(2πt/24) + B*sin(2πt/12) + C*sin(2πt/6)
• Use fitted function to extrapolate
Algorithms (cont.)
• Time series decomposition – isolate trend, seasonal, and residual components
• Extend trend and seasonal components into the future
Algorithms (cont.)
• Exponential smoothing – generate {St} based on a weighted average of previous data
• Simplest form is S1 = X0, St = αXt-1 + (1-α)St-1 for t>1, where α is a smoothing factor
The Data
• Use simulated datasets, representing usage each hour over 5 days
• {Xt} for 1 <= t <= 120• First 4 days are
historical data (training set), 5th day is the test set
Algorithm 1: Regression
Regression (cont.)
R2 = 0.424
Algorithm 2: Decomposition
Decomposition (cont.)
R2 = 0.693
Algorithm 3: Smoothing
Smoothing (cont.)
R2 = 0.516
Additional Trials
Trial # Regression Decomposition Smoothing
1 64.1 46.2 56.4
2 76 47.4 61.1
3 65.5 53.9 53.4
4 61.7 48.9 46.8
5 58.8 43.1 53.3
6 68.9 43.5 51.3
7 59.1 45.4 40.8
8 59.6 56.6 58.6
9 75.6 56.4 59.2
10 52.8 46.9 54.1
Average 64.21 48.83 53.5
Trial # Regression Decomposition Smoothing
1 0.424 0.693 0.516
2 0.374 0.721 0.455
3 0.388 0.577 0.543
4 0.53 0.601 0.593
5 0.383 0.687 0.527
6 0.382 0.64 0.682
7 0.515 0.722 0.783
8 0.457 0.459 0.389
9 0.506 0.612 0.719
10 0.468 0.507 0.348
Average 0.4427 0.6219 0.5555
Sum of absolute error R2
Conclusions
• Time series decomposition provided most accurate prediction of future usage, followed by exponential smoothing, then regression
• Possible explanation: usage pattern is strongly cyclic; repeats itself on a daily basis
• Suggestion: investigate further into better means of isolating seasonal data; some more sophisticated algorithms exist (ARIMA, stochastic volatility models).