Download - AINL 2016: Strijov
![Page 1: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/1.jpg)
Business intelligence III:Feature generation and model selectionfor multiscale time series forecasting
Vadim Strijov
Moscow Institute of Physics and Technology
AINL FRUCT2016, 10 - 12 of November
1 / 68
![Page 2: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/2.jpg)
Model selection for time series forecasting
The Internet of things is the world of networking devices (portables, vehicles, buildings)embedded with sensors and software.
I Environment and energy monitoringI Medical and health monitoringI Consumer support, sales monitoringI Urban management and manufacturing
2 / 68
![Page 3: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/3.jpg)
Case 1. Energy consumption and price forecasting, 1-day ahead hourly
The components of multivariate time series with periodicity
Time series:I energy price,I consumption,I daytime,I temperature,I humidity,I wind force,I holiday schedule.
Periodicity:I one year seasons
(temperature,daytime),
I one week,I one day (working
day, week-end),I a holiday,I aperiodic events.
3 / 68
![Page 4: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/4.jpg)
Energy consumption one-week forecast for each hour
4 / 68
![Page 5: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/5.jpg)
The autoregressive matrix, five weeks
2 4 6 8 10 12 14 16 18 20 22 24
5
10
15
20
25
30
35
40
Hours
Days
5 / 68
![Page 6: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/6.jpg)
The autoregressive matrix and the linear model
X∗(m+1)×(n+1)
=
sT sT−1 . . . sT−κ+1
s(m−1)κ s(m−1)κ−1 . . . s(m−2)κ+1
. . . . . . . . . . . .snκ snκ−1 . . . sn(κ−1)+1
. . . . . . . . . . . .sκ sκ−1 . . . s1
=
sT1×1
xm+11×n
ym×1
Xm×n
.
In terms of linear regression:y = f(X,w) = Xw,
ym+1 = sT = 〈xm+1, w〉.
6 / 68
![Page 7: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/7.jpg)
The one-day forecast: expected error is 3.1% working day, 3.7% week-end
5 10 15 20
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
x 104
Hours
Valu
e
History
Forecast
The model y = f(X,w) could be a linear model, neural network, deep NN, SVN, . . .7 / 68
![Page 8: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/8.jpg)
Structure of energy consumption
8 / 68
![Page 9: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/9.jpg)
Similarity of daily consumption
9 / 68
![Page 10: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/10.jpg)
Sunrise bias: one-year daytime and consumption
10 / 68
![Page 11: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/11.jpg)
Biased and original daytime to fit consumption over years
11 / 68
![Page 12: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/12.jpg)
One-hour line, day-by-day during a year: autoregressive analysis
12 / 68
![Page 13: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/13.jpg)
Daily similarity
13 / 68
![Page 14: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/14.jpg)
The model performance criteria and forecast errors
Stability:
I the error does not change significantly following small changes in time series,I the distribution of the model parameters does not change.
Complexity:
I the number of parameters (elements in superposition) is minimal,I the minimum description length principle holds the William Ockham’s rule.
Error: the residue εj = yj − yj for
I mean absolute error and (symmetric) mean absolute percent error
RSS =r∑
j=1
ε2j , MAPE =1
r
r∑
j=1
|εj ||yj |
, sMAPE =1
r
r∑
j=1
2|εj ||yj + yj |
.
14 / 68
![Page 15: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/15.jpg)
Design matrix
Forecast is a mapping from p-dimensional objects space to r -dimensional answers
space.
X∗ =
[x
1×ny
1×r
Xm×n
Ym×r
]
15 / 68
![Page 16: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/16.jpg)
Rolling validation
Z =
. . . . . .
xval,k1×n
yval,k1×r
Xtrain,kmmin×n
Ytrain,kmmin×r
. . . . . .
xk
The rolling validation procedure1) construct the validation vector x∗val,k for time series of the length ∆tr as the first
row of the design matrix Z,2) construct the rest rows of the design matrix Z for the time after tk and present it as3) optimize model parameters w using Xtrain,k ,Ytrain,k and compute
residues εk = yval,k − f(xvalk ,w) and MAPE,4) increase k and repeat.
16 / 68
![Page 17: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/17.jpg)
Case 2. Sales planning: to forecast the goods consumption
Retailers’ daily routines:I custom inventory,I calculation of optimal insurance stocks,I consumer demand forecasting.
I There given historical time series of the volume off-takes: foodstuff.I Let the time series be homoscedastic: its variance is time-constant.I Minimizing the loss function one must forecast the next sample.
17 / 68
![Page 18: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/18.jpg)
Custom inventory
18 / 68
![Page 19: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/19.jpg)
Excessive forecast and insufficient forecast lead to loss
19 / 68
![Page 20: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/20.jpg)
The performance criterion is minimum loss of money
Error functions: quadratic, linear, asymmetric.
20 / 68
![Page 21: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/21.jpg)
The time series of residues and its histogram
There given historgam
H = {Xi , gi}mi=1
and loss function
L = L(Z ,X ).
The optimal forecast is
X = arg minZ∈{X1...Xm}
m∑
i=1
giL(Z ,Xi )
of this convolution.
21 / 68
![Page 22: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/22.jpg)
Candies: the seasonality and trend weekly over three years
22 / 68
![Page 23: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/23.jpg)
Beverage: the week periodicity daily over four weeks
23 / 68
![Page 24: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/24.jpg)
Sparkling wine: holidays weekly over three years
24 / 68
![Page 25: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/25.jpg)
Promotional actions
25 / 68
![Page 26: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/26.jpg)
Promotional profile extraction
I Hypothesis: the shape of the profile (excluding the profile parameters) does notdepend of duration of the action.
I Problem: to forecast the customer demand during the promotional action.
26 / 68
![Page 27: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/27.jpg)
Forecast the residues to boost performance
1. Model f forecasts n(g) history ends x ft , ..., xft−n(g)+1 for one sample.
2. Compute n(g) residues εt , ..., εt−n(g)+1 as εt−k = xt−k − x ft−k .3. Function g forecasts residues εt+i ahead max(i) time-ticks.4. Combine forecasts x f ,gt+i = x ft+i + εt+i computing f for each sample x ft+i .
27 / 68
![Page 28: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/28.jpg)
Case 3. Forecasting volumes of Russian railways freight transportation
Keep a hierarchical structure of time series without loosing performance
Forecast with hierarchical aggregation of
I types of freight inI stations, regions, and roads,I for a day, week, month, and quarter,I counting all combinations above.
Satisfy the conditions:
I minimize error,I incorporate important external factors,I respect hierarchical structure,I do not exceed physical bounds of forecast values. 28 / 68
![Page 29: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/29.jpg)
The railroad map counts ∼ 78 regions, ∼ 4000 stations, and ∼ 100 rail-yards
29 / 68
![Page 30: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/30.jpg)
Independent forecasts might be inconsistent with aggregated ones
Time
50 100 150 200
Tim
eseries
-0.4
-0.2
0
0.2
0.4
30 / 68
![Page 31: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/31.jpg)
Hierarchical data χ, independent forecasts χ and reconciliated forecasts ϕ
χt =
xt(:, :). . .
xt(n, 1). . .
xt(n,m)
, χ =
x(:, :). . .
x(n, 1). . .
x(n,m)
, ϕ =
y(:, :). . .
y(n, 1). . .
y(n,m)
.
Consistency condition Sχt = 0, t = 1, . . . ,T , where the
link matrix S of size (2 + n + m)× (1 + n + m + nm) has the form
S =
−1 1 . . . 1 0 . . . 0 0 0 . . . 0 . . . 0 0 . . . 0−1 0 . . . 0 1 . . . 1 0 0 . . . 0 . . . 0 0 . . . 00 −1 . . . 0 0 . . . 0 1 1 . . . 1 . . . 0 0 . . . 0. . . . . . . . . . . . . . .0 0 . . . −1 0 . . . 0 0 0 . . . 0 . . . 1 1 . . . 10 0 . . . 0 −1 . . . 0 1 0 . . . 0 . . . 1 0 . . . 0. . . . . . . . . . . . . . .0 0 . . . 0 0 . . . −1 0 0 . . . 1 . . . 0 0 . . . 1
.
The regions
xt(:, :) =n∑
i=1
xt(i , :);
The freights
xt(:, :) =m∑j=1
xt(:, j);
Freights given a region
xt(i , :) =m∑j=1
xt(i , j),
i = 1, . . . n;
Regions given a freight
xt(:, j) =n∑
i=1
xt(i , j),
j = 1, . . .m;
t = 1, . . . ,T .
31 / 68
![Page 32: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/32.jpg)
Performance of independent and reconciliated forecastsThere given link matrix S, admissible sets A, Band independent forecasts χ 6∈ A, χ ∈ Bto make reconciliated forecast ϕ subject to
I consistency ϕ ∈ A, A = {χ ∈ Rd | Sχ = 0},I physical limitations ϕ ∈ B,
I precision lh(χT+1, ϕ) ≤ lh(χT+1, χ) for anyhierarchical data χT+1 ∈ A ∩ B.
Theorem [Maria Stenina, 2014]Given listed conditions, the projection vector
ϕ = χproj = arg minχ∈A∩B
lh(χ, χ)
is guaranteed to satisfy the requirements ofconsistency, physical limitations and precision.
The solution of the optimizationproblem ϕ = arg min
χ∈A∩B‖χ− χ‖22
demonstrates decrease in loss for allcontrol samples.
0 20 40 60 80 100−6
−5
−4
−3
−2
−1
0x 10
6
Control point
Lt
Lt = ‖χt − ϕ‖22 − ‖χt − χ‖2232 / 68
![Page 33: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/33.jpg)
Case 4. Internet of things, multiscale dataset for vector forecasting
Days
Energy
Max T.
Min T.
Precipitation
Wind
Humidity
Solar
Energy
Solar
τ′
τ
ti ti+1 t′
i= iτ
′ t′
i+1
Each real-valued time series s = [s1, . . . , si , . . . , sT ], si = s(ti ), 0 ≤ ti ≤ tmax
is a sequence of observations of some real-valued signal s(t) with its own sampling rate τ .
33 / 68
![Page 34: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/34.jpg)
To boost the forecast quality include external variables
34 / 68
![Page 35: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/35.jpg)
Generate features with nonparametric transformation functions
Univariate
Formula Output dimension√x 1
x√x 1
arctan x 1ln x 1x ln x 1
Bivariate
Plus x1 + x2Minus x1 − x2Product x1 · x2Division x1
x2x1√x2
x1 ln x2
35 / 68
![Page 36: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/36.jpg)
Nonparametric aggregation: sample statistics
Nonparametric transformations include basic data statistics:I Sum or average value of each row xi , i = 1, . . . ,m:
φi =n∑
j=1
xij , or φ′i =1
n
n∑
j=1
xij .
I Min and max values: φi = minj xij , φ′i = maxj xij .I Standard deviation:
φi =1
n − 1
√√√√n∑
j=1
(xij −mean(xi ))2.
I Data quantiles: φi = [X1, . . . ,XK ], wheren∑
j=1
[Xk−1 < xij ≤ Xk ] =1
K, for k = 1, . . . ,K .
36 / 68
![Page 37: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/37.jpg)
Nonparametric transformations: Haar’s transform
Applying Haar’s transform produces multiscale representations of the same data.
Assume that n = 2K and init φ(0)i,j = φ
′(0)i,j = xij for j = 1, . . . , n.
To obtain coarse-graining and fine-graining of the input feature vector xi , fork = 1, . . . ,K repeat:
I data averaging step
φ(k)i ,j =
φ(k−1)i ,2j−1 + φ
(k−1)i ,2j
2, j = 1, . . . ,
n
2k,
I and data differencing step
φ′(k)i ,j =
φ′(k−1)i ,2j − φ′(k−1)i ,2j−1
2, j = 1, . . . ,
n
2k.
The resulting multiscale feature vectors are φi = [φ(1)i , . . . ,φ
(K)i ] and φ′
i = [φ′(1)i , . . . ,φ
′(K)i ].
37 / 68
![Page 38: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/38.jpg)
Monotone functionsI By grow rate
Function name Formula ConstraintsLinear w1x + w0
Exponential rate exp(w1x + w0) w1 > 0
Polynomial rate exp(w1 ln x + w0) w1 > 1
Sublinearpolynomial rate
exp(w1 ln x + w0) 0 < w1 < 1
Logarithmic rate w1 ln x + w0 w1 > 0
Slow convergence w0 + w1/x w1 6= 0
Fast convergence w0 + w1 · exp(−x) w1 6= 0
I OtherSoft ReLu ln(1 + ex)
Sigmoid 1/(w0 + exp(−w1x)) w1 > 0
Softmax 1/(1 + exp(−x))
Hiberbolic tangent tanh(x)
softsign |x |1+|x | 38 / 68
![Page 39: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/39.jpg)
Collection of parametric transformation functions
Functionname
Formula Outputdim.
Num.of args
Num.of pars
Add constant x + w 1 1 1Quadratic w2x
2 + w1x + w0 1 1 3Cubic w3x
3 + w2x2 + w1x + w0 1 1 4
Logarithmicsigmoid
1/(w0 + exp(−w1x)) 1 1 2
Exponent exp x 1 1 0
Normal 1w1
√2π
exp((x−w2)2
2w21
)1 1 2
Multiply byconstant
x · w 1 1 1
Monomial w1xw2 1 1 2
Weibull-2 w1w2xw2−1 exp−w1x
w2 1 1 2Weibull-3 w1w2x
w2−1 exp−w1(x − w3)w2 1 1 3. . . . . . . . . . . . . . .
39 / 68
![Page 40: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/40.jpg)
Parametric transformations
Optimization of the transformation function parameters b is iterative:
1. Fix the vector b, collected over all the primitive functions {g}, which generatefeatures φ:
w = arg minS(w|f(w, x), y
), where φ(b, s) ⊆ x.
2. Optimize transformation parameters b given model parameters w
b = arg min S(b|f(w, x), y
).
Repeat these steps until vectors w, b converge.
40 / 68
![Page 41: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/41.jpg)
Markup the time series, two types of marks: Up and Down
41 / 68
![Page 42: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/42.jpg)
Parameters of the local models
More feature generation options:I Parameters of SSA approximation of the time series x(q).I Parameters of the FFT of each x(q).I Parameters of polynomial/spline approximation of each x(q).
42 / 68
![Page 43: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/43.jpg)
Parameters of the local models: SSA
For the time series s construct the Hankel matrix with a period k and shift p, so thatfor s = [s1, . . . , sT ] the matrix
H∗ =
sT . . . sT−k+1...
. . ....
sk+p . . . s1+p
sk . . . s1
, where 1 > p > k .
Reconstruct the regression to the first column of the matrix H∗ = [h,H] and denote itsleast square parameters as the feature vector
φ(s) = arg min ‖h−Hφ‖22.
For the orignal feature vector x = [x(1), . . . , x(Q)] use the parameters φ(x(q)),q = 1, . . . ,Q as the features.
43 / 68
![Page 44: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/44.jpg)
SSA: principal components
Compute the SVD of covariance matrix of H
1
NH
TH = VΛV
T, Λ = diag(λ1, . . . , λN)
and find the principal components yj = Hvj .
−5 0 5
−6
−4
−2
0
2
4
6
8
Principal component, y1
Principalcomponent,y2
44 / 68
![Page 45: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/45.jpg)
Models and features
Models:I Baseline method: si = si−1.I Multivariate linear regression (MLR) with l2-regularization. Regularization coefficient: 2I SVR with multiple output. Kernel type: RBF, p1: 2, p2: 0, γ: 0.5, λ: 4.I Feed-forward ANN with single hidden layer, size: 25I Random forest (RF). Number of trees: 25 , number of variables for each decision split: 48.
Feature combinations:I History: the standard regression-based forecast with no additional features.I SSA, Cubic, Conv, Centroids, NW: history + a particular feature.I All: all of the above, with no feature selection.I PCA and NPCA: all generation strategies with feature selection.
45 / 68
![Page 46: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/46.jpg)
Feature analysis
Histo
rySSA
Cub
ic
Con
v
Cen
troid
sN
W All
PCA
NPC
A
MLR
MSVR
RF
ANN
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Test subset MAPE
Ratio of times each combination of model and feature performed best for at least oneof the time series (7) or error functions (6), all (6) data sets (6× 7× 6 = 252 cases).
46 / 68
![Page 47: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/47.jpg)
Best models
Energ
y
Max
T.
Min
T.
Preci
pita
tion
Win
d
Hum
idity
Solar
ANNConv
MLRHistory
MSVRSSA
RFCubic
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Test subset residues
Energ
y
Max
T.
Min
T.
Preci
pita
tion
Win
d
Hum
idity
Solar
MLRHistory
MSVRSSA
RFCubic
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Standard deviations of test subset residues
47 / 68
![Page 48: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/48.jpg)
Case 5. Classification of human physical activity with mobile devices
3D-projection of acceleration time series to spatial axis
x = {accx(t); accy (t); accz(t)}nt=1 7→ y ∈ RS .
Slow walking
0 1 2 3 4 5
−0.5
0
0.5
1
1.5
2
Time t, s
Acceleration,x,y,z
Jogging
0 1 2 3 4
−1
0
1
2
3
4
Time t, sAcceleration,x,y,z
48 / 68
![Page 49: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/49.jpg)
Local transformations for deep learning neural network
Model f = a(hN(. . .h1(x)))(w) contains autoencoders hk and softmax classifier a:
f(w, x) =exp(a(x))∑j exp(aj(x))
, a(x) = WT
2 tanh(WT
1 x), hk(x) = σ(Wkx + bk),
where w minimizes the error function.
Feature generation by local transformations:
I parameters of SSA approximation of the time series x,I FFT of x,I parameters of polynomial/spline approximation,
could reduce complexity this model down to complexity of logistic regression.
49 / 68
![Page 50: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/50.jpg)
Parameters of the local models: SSA
For time series s construct the Hankel matrix witha period k and shift p, so that for s = [s1, . . . , sT ]the matrix
H∗ =
sT . . . sT−k+1...
. . ....
sk+2 . . . s2sk . . . s1
.
−5 0 5
−6
−4
−2
0
2
4
6
8
Principal component, y1
Principalcomponent,y2
Reconstruct the regression to the first column of the matrix H∗ = [h,H] and denote itsleast square parameters as the feature vector
φ(s) = arg min ‖h−Hφ‖22.
For the original feature vector x = [x(1), . . . , x(Q)] use the parameters φ(x) as an item.
50 / 68
![Page 51: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/51.jpg)
Human gate detection with time series segmentation
Find dissection of the trajectory of principal components yj = Hvj , where H is theHankel matrix and vj are its eigenvectors:
1
NH
TH = VΛV
T, Λ = diag(λ1, . . . , λN).
51 / 68
![Page 52: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/52.jpg)
Metric features: distances to the centroids of local clusters
Apply kernel trick to the time series.1. For objects xi from X compute k-mean centroids c.2. Use distance function ρ to combine feature vector
φi = [ρ(c1, xi ), . . . , ρ(cp, xi )] ∈ Rp+.
52 / 68
![Page 53: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/53.jpg)
Computing distances to centroids of the time series: sources
0 20 40 60 80 100 1200
1
2
Walking
t
x(t)
0 20 40 60 80 100 1200
1
2
Walking Upstairs
t
x(t)
0 20 40 60 80 100 1200
1
2
Walking Downstairs
t
x(t)
0 20 40 60 80 100 1200.98
1
1.02
Sitting
t
x(t)
0 20 40 60 80 100 1201.03
1.04
1.05
Standing
t
x(t)
20 40 60 80 100 120
0.99
1
Laying
t
x(t)
53 / 68
![Page 54: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/54.jpg)
Computing distances to centroids of the time series: aligned series
0 20 40 60 80 100 1200
1
2
Walking
t
x(t)
0 20 40 60 80 100 1200
1
2
Walking Upstairs
t
x(t)
0 20 40 60 80 100 1200
1
2
Walking Downstairs
t
x(t)
0 20 40 60 80 100 1201
1.05
Sitting
t
x(t)
0 20 40 60 80 100 1201.02
1.04
Standing
t
x(t)
20 40 60 80 100 120
0.98
1
Laying
t
x(t)
54 / 68
![Page 55: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/55.jpg)
Computing distances to centroids of the time series: centroids
0 20 40 60 80 100 1200
1
2
Walking
t
x(t)
0 20 40 60 80 100 1200
1
2
Walking Upstairs
t
x(t)
0 20 40 60 80 100 1200
1
2
Walking Downstairs
t
x(t)
0 20 40 60 80 100 120
1
1.05
Sitting
t
x(t)
0 20 40 60 80 100 1201
1.05
Standing
t
x(t)
20 40 60 80 100 120
1
1.02
Laying
t
x(t)
55 / 68
![Page 56: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/56.jpg)
Performance of the human physical activities classification model
1 2 3 4 5 6 7 8 9 10 11 12Class labels
0
5
10
15
20
Ob
ject
snu
mb
er
99.5% 97.8%98.7% 98.8% 97.5%100.0%99.4% 93.4% 97.0% 99.9% 99.4% 97.2%
Mean Accuracy: 0.98231) walk forward2) walk left3) walk right4) go upstairs5) go downstairs6) run forward7) jump up and down8) sit and fidget9) stand
10) sleep11) elevator up12) elevator down
56 / 68
![Page 57: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/57.jpg)
Case 6. How many parameters must be used? Relations of 24 hourly models
5 10 15 20
2
4
6
8
10
12
14
16
18
20
22
Hours
Pa
ram
ete
rs
57 / 68
![Page 58: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/58.jpg)
Selection of a stable set of features of restricted size
The sample contains multicollinear χ1,χ2 and noisy χ5,χ6 features, columns of the designmatrix X. We want to select two features from six.
Stability and accuracy for a fixed complexityThe solution: χ3,χ4is an orthogonal set of features minimizing the error function.
58 / 68
![Page 59: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/59.jpg)
Multicollinear features and the forecast: possible configurations
Inadequate and correlated Adequate and random
Adequate and redundant Adequate and correlated59 / 68
![Page 60: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/60.jpg)
Model parameter values with regularization
Vector-function f = f(w,X) = [f (w, x1), . . . , f (w, xm)]T ∈ Ym.
0 2 4 6 8 10
−15
−10
−5
0
5
10
15
20
25
30
Regularization, τ
Parameters,w
S(w) = ‖f(w,X)− y‖2 + γ2‖w‖20 2 4 6 8
−2
−1
0
1
2
3
Parameters sum,∑
i
|wi|
Parameters,w
S(w) = ‖f(w,X)− y‖2, T (w) 6 τ60 / 68
![Page 61: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/61.jpg)
Empirical distribution of model parameters
There given a sample {w1, . . . ,wK} of realizations of the m.r.v. w and an errorfunction S(w|D, f). Consider the set of points {sk = exp
(−S(wk |D, f)
)|k = 1, . . . ,K}.
0
0.2
0.4 0
0.2
0.4
0.02
0.04
0.06
0.08
w2w1
exp( −S
(w))
20 40 60 80 100
10
20
30
40
50
60
70
80
90
100
w1
w2
x- and y -axis: parameters w, z-axis: exp(−S(w)
). 61 / 68
![Page 62: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/62.jpg)
Minimize number of similar and maximize number of relevant features
Introduce a feature selection method QP(Sim, Rel) to solve the optimization problem
a∗ = arg mina∈Bn
aT
Qa− bT
a,
where matrix Q ∈ Rn×n of pairwise similarities of features χi and χj is
Q = [qij ] = Sim(χi ,χj) =
∣∣∣∣∣∣Cov(χi ,χj)√Var(χi )Var(χj)
∣∣∣∣∣∣
and vector b ∈ Rn of feature relevances to the target is
b = [bi ] = Rel(χi ),
where elements bi equal absolute values of the sample correlation coefficient betweenfeature χi and the target vector y.Number of correlated features Sim→ min, number of correlated to the target Rel→ max.
62 / 68
![Page 63: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/63.jpg)
Evaluation criteria for the diesel NIR spectra data set
Method Cp RSS ln λ1λn
SVD VIF BIC
QP (ρ, ρ) (τ = 10−9) −110 1.37 · 10−18 −25.7 6.43 · 106 548.38
Genetic −110.88 7.68 · 10−30 −24 8.13 · 105 534.19
LARS 3.22 · 1021 2.07 · 10−7 −28.3 7.94 · 107 529.47
Lasso 2.5 · 1028 1.61 −27.72 1.03 · 1021 1712.92
ElasticNet 2.51 · 1028 1.61 −27.72 1.03 · 1021 1712.92
Stepwise 3.66 · 1029 23.56 −36.78 1.94 · 1022 1919.23
Ridge 1.59 · 1028 1.02 −36.22 1.07 · 1022 1.79 · 103
Dependence of residual norm on the number ofselected features QP(Sim, Rel).
63 / 68
![Page 64: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/64.jpg)
ECoG brain signals to forecast upper limbs’ movements
I Neurotycho.org food-tracking dataset: 32 epidural electrodescaptures brain signals of the monkey (ECoG),
I 11 sensors track movements of the hand, contralateral to theimplant side,
I experiment duration is 15 minutes,I the experimenter feeds the monkey ≈ 4.5 per minute,I ECoG and motion data were sampled at 1KHz and 120Hz,
respectively.64 / 68
![Page 65: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/65.jpg)
Weather forecasting and geo (spatio) temporal dataset
Monthly Ecosystem Respiration by M. Reichstein, GHG-Europe 65 / 68
![Page 66: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/66.jpg)
Spatio temporal dataset, frequency versus time, given spatial position
Electric field measurement, the Van Allen probes by I. Zhelavskaya, Skoltech 66 / 68
![Page 67: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/67.jpg)
Подробнее о вышеизложенных задачахАвторегрессионное прогнозирование и выбор признаков
I Катруца А., Стрижов В. Проблема мультиколлинеарности при выборе признаков в регрессионных задачах //Информационные технологии, 2015, 1 : 8-18.
I Нейчев Р.Г., Катруца А.М., Стрижов В. Выбор оптимального набора признаков из мультикоррелирующего множества взадаче прогнозирования // Заводская лаборатория. Диагностика материалов, 2016, 3.
I Мотренко А.П., Рудаков К.В., Стрижов В.В. Учет влияния экзогенных факторов при непараметрическомпрогнозировании временных рядов // Вестник Московского университета. Серия 15. Вычислительная математика икибернетика, 2016. A
Классификация временных рядов акселерометраI Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer
// Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14.
I Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedicaland Health Informatics, 2016.
I Попова М.С., Стрижов В.В. Выбор оптимальной модели классификации физической активности по измерениямакселерометра // Информатика и ее применения, 2015, 9(1) : 79-89.
I Попова М. С., Стрижов В.В. Построение сетей глубокого обучения для классификации временных рядов // Системы исредства информатики, 2015, 25(3) : 60-77.
I Гончаров А.В., Стрижов В.В. Метрическая классификация временных рядов со взвешенным выравниваниемотносительно центроидов классов // Информатика и ее применения, 2016, 2.
Согласование прогнозов иерархических временных рядовI Стенина М.М., Стрижов В.В. Согласование прогнозов при решении задач прогнозирования иерархических временных
рядов // Информатика и ее применения, 2015, 9(2) : 77-89.
I Стенина М., Стрижов В. Согласование агрегированных и детализированных прогнозов при решении задачнепараметрического прогнозирования // Системы и средства информатики, 2014, 24(2) : 21-34.
67 / 68
![Page 68: AINL 2016: Strijov](https://reader031.vdocuments.us/reader031/viewer/2022030316/58749de11a28abfc5f8b6a83/html5/thumbnails/68.jpg)
Model selection for time series forecasting
Thanks to the Chair of Intelligent Systems, MIPT
I Anastasia Motrenko
I Mikhail Kuznetsov
I Alexandr Aduenko
I Arsentiy Kuzmin
I Maria Stenina
I Alexandr Katrutsa
I Oleg Bakhteev
I Maria Popova
I Andrey Kulunchakov
I Mikhail Karasikov
I Radoslav Neychev
I Alexey Goncharov
I Roman Isachenko
I Maria Vladimirova
http://[email protected]
68 / 68