multi-scale streaming anomalies detection for time series · streaming pca i dimensionality...

1
Multi-scale streaming anomalies detection for time series B Ravi Kiran Université Lille 3, CRIStAL, Lille, France Streaming anomaly detection I x ( t ) are observations over time t where new data arrives over time t . I Problem : Window x ( t ) , [t : t - p + 1] which “deviates” from normal behavior. I Motivation : How to become invariant to the window-size/scale of pseudo-periodic structure in x ( t ) ? Streaming multi-scale Lag-matrix construction X p t = [x t , x t -1 ,..., x t -p+1 ] T R p Streaming PCA I Dimensionality reduction for time series lag embedding I Recursive update for principal subspace I Linear Principal Component Analysis criterion : Given X p R T ×p , w p is defined as 1-D projection capturing most of the energy of samples : w p = arg min k w k =1 T t =1 k X p t - ( ww T ) X p t k 2 Multiscale streaming PCA Algorithm 1 Streaming PCA Initialization: w j 0, σ 2 j with 1 for t = 1,..., T do for j = 1,..., J do Z j t H T 2 j X j t y j t w T j Z j t σ 2 j σ 2 j + ( y j t ) 2 e j t Z j t - y j t w j w j w j + σ -2 j y j t e j t H Z j t w T j Z j t w j α j t ←k H Z j t - Z j t k 2 end for end for return α R T ×J Given x ( t ) R for t = 1 : T I We evaluate the lag-matrix X R T ×p where p = 2 j . I For each X t R p we perform a change of basis Z t := Φ T X t I H refers to a unitary tx. that can . localize a deviation from the local mean and variations. . Preserve the variance. I Haar transform Φ = H : H 2N = 1 2 " H N [1, 1] I N [1, -1] # Hierarchical (Approximation) PCA 2 j -1 2 j -1 2 j -1 2 j -1 Future w T j -1 · w T j -1 · w T j -1 · w T j -1 · w T j · w T j · w T j +1 · For scale p j +1 = 2p j , a reduced lag matrix Z j +1 t is obtained by projection of each component of size p j on the principal direction obtained at this scale, i.e. Z j +1 t = [w T j Z j t , w T j Z j t -2 j ] T with Z 1 t = X 1 t . The principal direc- tion is updated aer this step. Global flow Input x t X 2 j =1 t X 2 j =2 t X 2 j =J t Update w 1 Update w 2 Update w J α 1 t = k X 1 t - f X 1 t k α 2 t = k X 2 t - f X 2 t k α J t = k X J t - H X J t k Norm k α t k 2 2nd iteration k H α t - α t k 2 Least corr. scale arg min j i (α T α ) ji . . . . . . . . . . . . AUC performances across dierent aggregation methods Performance Evaluation : Area under the receiver operators characteristics curve (AUC) (FPR vs TPR), 0 (worst) & 1 (perfect). Eect of 2nd iteration of Streaming PCA Least correlated scale Vs 2nd iteration of Streaming PCA : I Least correlated scale and 2nd iteration of streaming PCA decorrelates the reconstruction error across scales. I The least correlated scale performs beer when there is a single scale structure across time series. Second iteration performs beer when there are multiple scales. Future work I Understand bounds on reconstruction error α ( t ) for Streaming PCA. I Beer base-line by comparing with streaming covariance estimation. I Probabilistic anomaly score using gaussian neg-log score. I Recursively calculable multi-scale time series representation H T X t to model long range dependencies. References I Papadimitriou et al. Optimal multi-scale paerns in time series streams, 2006 ACM SIGMOD I Bin Yang, Projection approximation subspace tracking. Trans. Sigal Processing, Jan. 1995 I Evaluating real-time anomaly detection algorithms, Numenta benchmark, ICMLA 2015 https://beedotkiran.github.io/anomaly.html [email protected]

Upload: others

Post on 07-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-scale streaming anomalies detection for time series · Streaming PCA I Dimensionality reduction for time series lag embedding I Recursive update for principal subspace I Linear

Multi-scale streaming anomalies detection for time seriesB Ravi Kiran

Université Lille 3, CRIStAL, Lille, France

Streaming anomaly detection

I x (t ) are observations over time t where new data arrives over time t .I Problem : Window x (t ), [t : t − p + 1] which “deviates” from normal behavior.I Motivation : How to become invariant to the window-size/scale of

pseudo-periodic structure in x (t ) ?

Streaming multi-scale Lag-matrix construction

Xpt = [xt,xt−1, . . . ,xt−p+1]

T ∈ Rp

Streaming PCA

I Dimensionality reduction for time series lag embeddingI Recursive update for principal subspaceI Linear Principal Component Analysis criterion : Given Xp ∈ RT×p, wp is

defined as 1-D projection capturing most of the energy of samples :wp = arg min‖w‖=1

∑Tt=1 ‖X

pt − (wwT )Xp

t ‖2

Multiscale streaming PCA

Algorithm 1 Streaming PCAInitialization: wj ← 0, σ 2

j ← ϵ with ϵ �1for t = 1, . . . ,T do

for j = 1, . . . , J doZ jt ← HT

2jX jt

y jt ← wTj Z

jt

σ 2j ← σ 2

j + (y jt )2

ejt ← Z jt − y

jtwj

wj ← wj + σ−2j y jte

jt

Z̃ jt ← wT

j Zjtwj

αjt ← ‖Z̃

jt − Z

jt ‖

2

end forend forreturn ααα ∈ RT×J

Given x (t ) ∈ R for t = 1 : TI We evaluate the lag-matrixX ∈ RT×p where p = 2j.I For each Xt ∈ Rp we perform a

change of basis Zt := ΦTXtI H refers to a unitary tx. that can. localize a deviation from the

local mean and variations.. Preserve the variance.I Haar transform Φ = H :

H2N =1√2

[HN ⊗ [1,1]IN ⊗ [1,−1]

]

Hierarchical (Approximation) PCA

2j−1 2j−1 2j−1 2j−1

Future

wTj−1· wT

j−1·wTj−1· wT

j−1·

wTj · wT

j ·

wTj+1·

For scale pj+1 = 2pj, a reduced lagmatrixZ j+1

t is obtained by projectionof each component of size pj on theprincipal direction obtained at thisscale, i.e. Z j+1

t = [wTj Z

jt ,w

Tj Z

jt−2j

]T

with Z 1t = X 1

t . The principal direc-tion is updated a�er this step.

Global flow

Input xt X 2j=1

t

X 2j=2

t

X 2j=Jt

Update w1

Update w2

Update wJ

α1t = ‖X

1t − X̃

1t ‖

α2t = ‖X

2t − X̃

2t ‖

α Jt = ‖X

Jt − X̃

Jt ‖

Norm‖ααα t‖

2

2nd iteration‖α̃αα t − ααα t‖

2

Least corr. scalearg minj

∑i(ααα

Tααα )ji

... ... ... ...

AUC performances across di�erent aggregation methods

Performance Evaluation : Area under the receiver operatorscharacteristics curve (AUC) (FPR vs TPR), 0 (worst) & 1 (perfect).

E�ect of 2nd iteration of Streaming PCA

Least correlated scale Vs 2nd iteration of Streaming PCA :I Least correlated scale and 2nd iteration of streaming PCA decorrelates the

reconstruction error across scales.I The least correlated scale performs be�er when there is a single scale

structure across time series. Second iteration performs be�er when there aremultiple scales.

Future work

I Understand bounds on reconstruction error ααα (t ) for Streaming PCA.I Be�er base-line by comparing with streaming covariance estimation.I Probabilistic anomaly score using gaussian neg-log score.I Recursively calculable multi-scale time series representation HTXt to model

long range dependencies.

References

I Papadimitriou et al. Optimal multi-scale pa�erns in time series streams, 2006 ACM SIGMOD

I Bin Yang, Projection approximation subspace tracking. Trans. Sigal Processing, Jan. 1995

I Evaluating real-time anomaly detection algorithms, Numenta benchmark, ICMLA 2015

https://beedotkiran.github.io/anomaly.html [email protected]