time series (2) - unibo.it · pdf filefor undefined terms (e.g., s ... a real-world graphical...
TRANSCRIPT
1
Time Series (2)Information Systems M
Prof. Paolo Ciaccia
http://www-db.deis.unibo.it/courses/SI-M/
The “alignment” problem
� Euclidean distance, as well as other Lp-norms, are not robust w.r.t., even
small, contractions/expansions of the signal along the time axis
� E.g., speech signals, music
� Intuitively, we would need a distance measure that is able to “match” a
point of time series s even with “surrounding” points of time series q
� Alternatively, we may view the time axis as a “stretchable” one
� A distance like this exists, and is called “Dynamic Time Warping” (DTW)
Sistemi Informativi M 2
Fixed Time Axis
Sequences are aligned “one to one”
“Warped” Time Axis
Non-linear alignments are possible
Time series (2)
2
The DTW distance
� The DTW distance is well-defined even for series of different length
� It is parametric in a “ground” distance d, which measures the difference
between samples of the series
� Usually, di,j ≡ d(si,qj) = (si - qj)2 or d(si - qj) = |si - qj|
� Let s[1:n] and q[1:m] be two series of length n and m, respectively
� The definition of DTW is recursive:
DTW(s[1:n],q[1:m])2 = dn,m + min{DTW(s[1:n],q[1:m-1])2,
DTW(s[1:n-1],q[1:m])2,
DTW(s[1:n-1],q[1:m-1])2 }
DTW(s[1:1],q[1:1])2 = d1,1
� For undefined terms (e.g., s[1:0]) substitute ∞ in the above equation
Time series (2) Sistemi Informativi M 3
The DTW distance: an example
� Consider the two time series:
� This is the d matrix of
ground distances, di,j = (si - qj)2
� On the right, the (squared) “DTW” matrix: DTWi,j = DTW(s[1:i],q[1:j])2
Sistemi Informativi M 4
7 25 16 25 36 16 9
3 1 0 1 4 0 1
4 4 1 4 9 1 0
5 9 4 9 16 4 1
2 0 1 0 1 1 4
1 1 4 1 0 4 9
d 2 3 2 1 3 4
s
q
7 40 22 31 43 24 15
3 15 6 7 11 8 6
4 14 6 9 18 8 5
5 10 5 11 18 7 5
2 1 2 2 3 4 8
1 1 5 6 6 10 19
DTW 2 3 2 1 3 4
s
q
Time series (2)
1 2 3 4 5 6
s 1 2 5 4 3 7
q 2 3 2 1 3 4
n=m=6
L2(s,q) = √√√√29
3
What DTW computes (1)
� Consider the d matrix, and the following “game”:
� Start from (1,1) and end in (n,m)
� Take one step at a time
� At each step, move only by increasing i, j, or both
� I.e., never go back!
� “Jumps” are not allowed!
� Sum all distances you have found in the “warping path”
� This is 21 in the figure below
Time series (2) Sistemi Informativi M 5
7 25 16 25 36 16 9
3 1 0 1 4 0 1
4 4 1 4 9 1 0
5 9 4 9 16 4 1
2 0 1 0 1 1 4
1 1 4 1 0 4 9
2 3 2 1 3 4
s
warping path w
q
find the optimal (minimum cost)
warping path!
What DTW computes (2)
� Each warping path is a way to “align” (match) s and q, such that all samples are
matched with at least one sample of the other series
� The DTW distance is the (square root of the) cost of the optimal warping path
� The “Euclidean path” moves only along the main diagonal, and costs 29 in the example
� The recursive definition allows DTW to be computed in O(n×m) time, even if the
number of warping paths is exponential!
� This is an example of dynamic programming algorithm
� When the DTW matrix has been filled, the optimal warping path can be
recovered by going back from DTWn,m
Time series (2) Sistemi Informativi M 6
s
q
7 40 22 31 43 24 15
3 15 6 7 11 8 6
4 14 6 9 18 8 5
5 10 5 11 18 7 5
2 1 2 2 3 4 8
1 1 5 6 6 10 19
DTW 2 3 2 1 3 4
4
Visualizing the optimal warping path
Time series (2) Sistemi Informativi M 7
q
sq
s
q
s
A real-world graphical example
Sistemi Informativi M 8
Power-Demand time teries
Each sequence corresponds
to a week’s demand for power
in a Dutch research facility
in 1997
Monday was a holiday
Wednesday was a holiday
Time series (2)
5
Fast searching with DTW
� We have now 2 problems to face, if we want to use DTW for searching:
1. Computing the DTW distance is very time-consuming
2. How to index it?
� Both problems can be solved:
1. Use a lower-resolution approximation of the time series
� However the method can introduce false dismissals
Sistemi Informativi M 9
1.3 sec
22.7 sec
Time series (2)
DTW with global constraints
� Usually, even to avoid “pathological” alignments, one sets some “global
constraint” on the allowed warping paths
� One of them, suitable for the case n=m, is the so-called Sakoe-Chiba band of
width b, which reduces the complexity to O(n×b)
Time series (2) Sistemi Informativi M 10
q
s
The Sakoe-Chiba band
of width b=4
s
q
7 43 24 17
3 7 11 8 8
4 6 9 18 8 7
5 10 5 11 18 7
2 1 2 2 3
1 1 5 6
DTW 2 3 2 1 3 4
Our example with b=2
6
Lower-bounding the DTW distance (1)
� An effective indexing technique for DTW has been proposed in [Keo02]
� The method requires a global constraint (e.g. Sakoe-Chiba band)
� The basic idea of LBKeogh is to create an “envelope”, whose “thickness”
depends on b, of the query q, Env(q) = (Low(q),Up(q))
Sistemi Informativi M 11Time series (2)
Low
Up
q
Upi = max{q[i-b:i+b]}
Lowi = min{q[i-b:i+b]}
Lower-bounding the DTW distance (2)
� Env(q) provides the means to derive an Euclidean-like lower-bound to the
DTW distance between q and any series s
� For this, each sample of s is aligned exactly once
� There are 3 cases to consider:
� si > Upi: since si must be aligned with at least one sample of q[i-b:i+b], it follows that this alignment costs at least (si - Upi)
2
� si < Lowi: since si must be aligned with at least one sample of q[i-b:i+b], it follows that the cost of this alignment is at least (Lowi- si)
2
� si ∈ [Lowi,Upi]: in this case only the trivial bound 0 applies
� Thus, putting all together:
Time series (2) Sistemi Informativi M 12
( )( )∑
=
<−
>−
=n
1i
ii
2
ii
ii
2
ii
2
Keogh
otherwise0
Lows ifsLow
Ups ifUps
q)(s,LB
7
Visualizing LBKeogh
� The sum of the (squared) lengths of the vertical lines equals LBKeogh(s,q)2
Time series (2) Sistemi Informativi M 13
sUp
Lowq
Indexing the DTW (1)
� Because of high-dimensionality, an approximate representation is needed
for indexing time series
� The second step of Keogh’s method computes, for each time series s in the
DB, its PAA-approximation, s’, using a suitable window size W
� Let n’ = n/W be the dimensionality of s’
Sistemi Informativi M 14Time series (2)
0 20 40 60 80 100 120 140
s
s'
W
( )
W
s
s
Wi
1W1it
t
'
i
∑×
+×−==
8
Indexing the DTW (2)
� PAA-approximations are indexed with an R-tree
� The region of a node N, Reg(N), is completely defined by the coordinates of
the two opposite vertices L and H (li ≤ hi):
� What remains to be done is to compute a PAA-approximation of Env(q)…
Sistemi Informativi M 15Time series (2)
L = {l1, l2, …, ln’}
H = {h1, h2, …, hn’}
h1
h2
hi
l1l2
li
Indexing the DTW (3)
� The PAA-approximation of the envelope, Env’(q) = (Low’(q),Up’(q)),
“encloses the envelope”:
Sistemi Informativi M 16Time series (2)
( ){ }( ){ }
iW1W1i
'
i
iW1W1i
'
i
Low,...,LowminLow
Up,...,UpmaxUp
+−
+−
=
=
Low’
Up’
q
9
Indexing the DTW (4)
� The final step consists in defining a “MinDist” between Env’(q) and Reg(N),
which results in:
� It can be proved that, if s ∈Reg(N), this distance lower bounds the DTW
distance between s and q
Sistemi Informativi M 17Time series (2)
( )( )∑
=
<−
>−
=n'
1i
'
ii
2
i
'
i
'
ii
2'
ii
2'
min
otherwise0
Lowh ifhLow
Upl ifUpl
W(q))Env(Reg(N),d
Low’
Up’
hi
li
Some experimental results (from [Keo02])
� Two datasets (left: synthetic; right: a mix of real data)
Sistemi Informativi M 18
0
0.2
0.4
0.6
0.8
1
210 212 214 216 218 220
No
rmalized
CP
U C
ost
210 212 214 216 218 220
Random Walk II Mixed Bag
Seq. scanLB_Keogh
Seq. scanLB_Keogh
DB size DB size
Time series (2)
10
Final considerations
� We have just seen some basic techniques to deal with (large) time series
databases
� Several other distance measures have been proposed, besides DTW, to deal with the “alignment” problem
� Many other relevant problems exist and have attracted interest, among
which:
� Searching for similar sub-sequences
� Searching for multi-dimensional time series (i.e., trajectories)
Sistemi Informativi M 19Time series (2)
Applying time series techinquesto shape matching
The WARP case
11
WARP (Bartolini, Ciaccia & Patella, 2005)
� WARP (= WARP: Accurate Retrieval based on Phase) is a shape matching
technique characterized by the following features:
� DFT-based dimensionality reduction
� Modification of Fourier coefficients so as to achieve invariance w.r.t. a series of transformations
� DTW distance for comparing shape descriptors
21
Edge detection
Parameterization
DFT
Invariance transformations
Dimensionality reduction
amplitude phase
amplitude phase
Boundary parameterization
� Starting from a boundary description obtained through a shape extraction
algorithm, WARP parameterizes the boundary to obtain a complex discrete-
time periodic signal whose period N equals the number of points in the
object boundary
� The output of this step is therefore the signal
z(t) = x(t)+ j y(t) t= 0,…,N-1
Time series (2) Sistemi Informativi M 22
Edge detection
Parameterization
12
DFT
� From the z signal, a frequency representation is obtained using DFT:
where Rm and Θm are the amplitude and the phase of the m-th DFT
coefficient, respectively
Time series (2) Sistemi Informativi M 23
DFT
amplitude phase
12N1,...,,...,-1,0,2Nm
eR z(t)eZ mjΘ
m
1N
0t
mt/N2πj
m
−−=
==∑−
=
−
Dimensionality reduction
� A lower-dimensional approximation is obtained by retaining only the M
Fourier coefficients closer to 0 (m = -M/2,…,-1,0,1,…,M/2-1)
Time series (2) Sistemi Informativi M 24
Dimensionality reduction
amplitude phase
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000
Fraction of energy vs no. of coefficientsM=32 (90% of the energy) is used in the
experiments
13
Invariance transformations
� The purpose of this step is to modify DFT coefficients so as to obtain
invariance with respect to:
1. Translation (= changing the object position)
2. Scaling (= changing the object size)
3. Rotation
4. Starting point of parameterization
� Invariance w.r.t. 3. and 4. can be trivially obtained by discarding the phase
of DFT coefficients, but this negatively influences the accuracy of retrieval
� The transformations lead to a set of normalized DFT (N-DFT) coefficients ZN
Time series (2) Sistemi Informativi M 25
Invariance transformations
amplitude phase
The effect of phase
� By using only the amplitude spectra, and comparing them using L2,
(a) results more similar to (b) than to (c) (!?)
Time series (2) Sistemi Informativi M 26
amplitude phase
14
1. Translation
� Each point z(t) is shifted by a
constant value c: z’(t) = z(t)+c
� The DFT of the z’ signal is:
� m ≠ 0: the sum is 0
� m = 0: the sum is ≠ 0
� Translation only influences the
DC coefficient, which indeed is
the average of the signal
The 0-frequency
DFT coefficient is ignored:
Time series (2) Sistemi Informativi M 27
z(t)
z(t) + c
c
∑
∑−
=
−
−
=
−
+=
+=
1N
0t
mt/N2πj
m
1N
0t
mt/N2πj'
m
ec Z
c)e(z(t)Z
0ZN
0 =
2. Scaling
� If z’(t) = k z(t), the DFT of z’ is:
Time series (2) Sistemi Informativi M 28
z(t)
k z(t)m
'
mm
'
m
m
1N
0t
mt/N2πj'
m
ΘΘ ;Rk R
k Z z(t)ek Z
==
==∑−
=
−
� Scaling only influences the
amplitude of DFT coefficients
� To obtain scale invariance:
The amplitude of the N-DFT
coefficients is normalized:
1m
N
m R/RR =
15
3. Rotation
� If the image is rotated counter-clockwise
by an angle θ, it is z’(t) = z(t) ejθ and
Time series (2) Sistemi Informativi M 29
z(t)
z(t) ejθ
θ
θΘΘ ;RR
e Zez(t)eZ
m
'
mm
'
m
jθ
m
1N
0t
mt/N2πjjθ'
m
+==
==∑−
=
−
� Rotation adds to the phase of
each coefficient a constant
value
� To achieve rotation-invariance:
The phase of the N-DFT
coefficients is obtained as, e.g.:
ΘΘΘΘNm = ΘΘΘΘm - ΘΘΘΘ 1
The term to be subtracted is
indeed different, since it also
depends on the last invariance…
4. Starting point
� Changing the starting point
corresponds to a shifting in the
time domain
� Thus, it is z’(t) = z(t-t0)
(z’ is “anticipated” by t0)
� The “shift theorem” of DFT yields:
� Thus, all coefficients are rotated
by an amount that is linear in the
frequency (m)…
Time series (2) Sistemi Informativi M 30
z(t) p0
z(t - t0)
/Nmt2πΘΘ ;RR
e Z)et-z(tZ
0m
'
mm
'
m
/Nmt2πj
m
1N
0t
mt/N2πj
0
'
m0
−==
== −−
=
−∑
16
3. and 4. together
� Combining 3. and 4. we get: z’(t) = z(t-t0) ejθ and
� In particular, it is:
� The normalized phase is:
� To see why this works, consider the N-DFT coefficient of the z’ signal:
Time series (2) Sistemi Informativi M 31
/Nt2πθΘΘ
/Nt2πθΘΘ
01-
'
1-
01
'
1
++=
−+=
2
ΘΘ-
2
ΘΘΘΘ
11-11-m
N
m
+−+= m
2
ΘΘ-
2
ΘΘmΘΘ
'
1
'
1-
'
1
'
1-'
m
N'
m
+−+=
/Nmt2πθΘΘ 0m
'
m −+=
N
m11-11-
m0101-
0101-0m
Θ2
ΘΘ
2
ΘΘmΘ
2
π/N2tθΘπ/N2tθΘ-
2
π/N2tθ-Θπ/N2tθΘmπ/N2mtθΘ
=+
−−
+=−++++
+−+++−+=
Comparing images
� Images are compared using DTW in the time domain by applying the Inverse
DFT (IDFT) to the N-DFT descriptors
Time series (2) Sistemi Informativi M 32
amplitude
phase
QUERY IMAGE=
?
IDFT
DTW
amplitude
phase
DB IMAGE
IDFT
17
Experimental results (1)
� NoPhase-EU: Phase is discarded, Euclidean distance
� WARP-EU: Phase is retained, Euclidean distance
Time series (2) Sistemi Informativi M 33
-10
0
10
20
30
40
50
60
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Pre
cisi
on
ga
in (
%)
Recall
NoPhase-EU
Warp-EU
Experimental results (2)
� MAXC: parameterization method using maximum curvature points
Time series (2) Sistemi Informativi M 34
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.2 0.4 0.6 0.8 1
Pre
cisi
on
Recall
WARP
MAXC