indexing and mining streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1...
TRANSCRIPT
CMU SCS
Indexing and Mining Time
Sequences
Christos Faloutsos and Lei Li
CMU
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 2
Outline
• Motivation
• Similarity Search and Indexing
• DSP (Digital Signal Processing)
• Linear Forecasting
• Kalman filters
• fractals and multifractals
• Non-linear forecasting
• Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 3
Problem definition
• Given: one or more sequences
x1 , x2 , … , xt , …
(y1, y2, … , yt, …
… )
• Find
– similar sequences; forecasts
– patterns; clusters; outliers
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 4
Motivation - Applications
• Financial, sales, economic series
• Medical
– reactions to new drugs
– elderly care
CMU SCS
ECG - physionet.org
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 5
CMU SCS
EEG - epilepsy
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 6from wikipedia
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 7
Motivation - Applications
(cont‟d)
• „Smart house‟
– sensors monitor temperature, humidity,
air quality
• video surveillance
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 8
Motivation - Applications
(cont‟d)
• civil/automobile infrastructure
– bridge vibrations [Oppenheim+02]
– road conditions / traffic monitoring
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 9
Motivation - Applications
(cont‟d)
• Weather, environment/anti-pollution
– volcano monitoring
– air/water pollutant monitoring
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 10
Motivation - Applications
(cont‟d)
• Computer systems
– „Active Disks‟ (buffering, prefetching)
– web servers (ditto)
– network traffic monitoring
– ...
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 11
Stream Data: Disk accesses
time
#bytes
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 12
Problem #1:
Goal: given a signal (eg., #packets over time)
Find: patterns, periodicities, and/or compress
year
count lynx caught per year
(packets per day;
temperature per day)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 13
Problem#2: Forecast
Given xt, xt-1, …, forecast xt+1
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sen
t
??
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 14
Problem#2‟: Similarity search
Eg., Find a 3-tick pattern, similar to the last one
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sen
t
??
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 15
Problem #3:
• Given: A set of correlated time sequences
• Forecast „Sent(t)‟
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sent
lost
repeated
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 16
Important observations
Patterns, rules, forecasting and similarity indexing are closely related:
• To do forecasting, we need
– to find patterns/rules
– to find similar settings in the past
• to find outliers, we need to have forecasts
– (outlier = too far away from our forecast)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 17
Important topics NOT in this
tutorial:
• Continuous queries
– [Babu+Widom ] [Gehrke+] [Madden+]
• Categorical data streams
– [Hatonen+96]
• Outlier detection (discontinuities)
– [Breunig+00]
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 18
Outline
• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 19
Outline
• Motivation
• Similarity Search and Indexing
– distance functions: Euclidean;Time-warping
– indexing
– feature extraction
• DSP
• ...
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 20
Importance of distance functions
Subtle, but absolutely necessary:
• A „must‟ for similarity indexing (->
forecasting)
• A „must‟ for clustering
Two major families
– Euclidean and Lp norms
– Time warping and variations
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 21
Euclidean and Lp
n
i
ii yxyxD1
2)(),(x(t) y(t)
...
n
i
p
iip yxyxL1
||),(
•L1: city-block = Manhattan
•L2 = Euclidean
•L
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 22
Observation #1
• Time sequence -> n-d
vector
...
Day-1
Day-2
Day-n
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 23
Observation #2
Euclidean distance is
closely related to
– cosine similarity
– dot product
– „cross-correlation‟
function
...
Day-1
Day-2
Day-n
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 24
Time Warping
• allow accelerations - decelerations
– (with or w/o penalty)
• THEN compute the (Euclidean) distance (+
penalty)
• related to the string-editing distance
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 25
Time Warping
„stutters‟:
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 26
Time Warping
Q: how to compute it?
A: dynamic programming
D( i, j ) = cost to match
prefix of length i of first sequence x with prefix of length j of second sequence y
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 27
Thus, with no penalty for stutter, for sequences
x1, x2, …, xi,; y1, y2, …, yj
),1(
)1,(
)1,1(
min][][),(
jiD
jiD
jiD
jyixjiD x-stutter
y-stutter
no stutter
Time WarpingSkip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 34
Other Distance functions
• piece-wise linear/flat approx.; compare
pieces [Keogh+01] [Faloutsos+97]
• „cepstrum‟ (for voice [Rabiner+Juang])
– do DFT; take log of amplitude; do DFT again!
• Allow for small gaps [Agrawal+95]
CMU SCS
More distance functions.
• Chen + Ng [vldb‟04]: ERP „Edit distance
with Real Penalty‟: give a penalty to stutters
• Keogh+ [kdd‟04]: VERY NICE, based on
information theory: compress each
sequence (quantize + Lempel-Ziv), using
the other sequences‟ LZ tables
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 35
On The Marriage of Lp-norms and Edit Distance, Lei Chen,
Raymond T. Ng:, VLDB‟04
Towards Parameter-Free Data Mining, E. Keogh, S. Lonardi,
C.A. Ratanamahatana, KDD‟04
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 36
Conclusions
Prevailing distances:
– Euclidean and
– time-warping
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 37
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DSP
• ...
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 38
Indexing
Problem:
• given a set of time sequences,
• find the ones similar to a desirable query
sequence
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 39
day
$price
1 365
day
$price
1 365
day
$price
1 365
distance function: by expert
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 40
Idea: „GEMINI‟
Eg., „find stocks similar to MSFT’
Seq. scanning: too slow
How to accelerate the search?
[Faloutsos96]
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 41
day1 365
day1 365
S1
Sn
F(S1)
F(Sn)
„GEMINI‟ - Pictorially
eg, avg
eg,. std
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 42
GEMINI
Solution: Quick-and-dirty' filter:
• extract n features (numbers, eg., avg., etc.)
• map into a point in n-d feature space
• organize points with off-the-shelf spatial
access method („SAM‟)
• discard false alarms
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 43
Examples of GEMINI
• Time sequences: DFT (up to 100 times
faster) [SIGMOD94];
• [Kanellakis+], [Mendelzon+]
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 44
Indexing - SAMs
Q: How do Spatial Access Methods (SAMs)
work?
A: they group nearby points (or regions)
together, on nearby disk pages, and answer
spatial queries quickly („range queries‟,
„nearest neighbor‟ queries etc)
For example:
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 45
R-trees
• [Guttman84] eg., w/ fanout 4: group nearby
rectangles to parent MBRs; each group ->
disk page
A
B
C
DE
FG
H
I
J
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 46
R-trees
• eg., w/ fanout 4:
A
B
C
DE
FG
H
I
J
P1
P2
P3
P4
F GD E
H I JA B C
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 47
R-trees
• eg., w/ fanout 4:
A
B
C
DE
FG
H
I
J
P1
P2
P3
P4
P1 P2 P3 P4
F GD E
H I JA B C
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 48
R-trees - range search?
A
B
C
DE
FG
H
I
J
P1
P2
P3
P4
P1 P2 P3 P4
F GD E
H I JA B C
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 49
R-trees - range search?
A
B
C
DE
FG
H
I
J
P1
P2
P3
P4
P1 P2 P3 P4
F GD E
H I JA B C
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 50
Conclusions
• Fast indexing: through GEMINI
– feature extraction and
– (off the shelf) Spatial Access Methods
[Gaede+98]
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 51
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DSP
• ...
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 52
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD, etc (data dependent)
• MDS, FastMap
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 53
DFT and cousins
• very good for compressing real signals
• more details on DFT/DCT/DWT: later
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 54
DFT and stocks
0.00
2000.00
4000.00
6000.00
8000.00
10000.00
12000.00
1 11 21 31 41 51 61 71 81 91 101 111 121
Fourier appx actual
• Dow Jones Industrial
index, 6/18/2001-
12/21/2001
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 55
DFT and stocks
0.00
2000.00
4000.00
6000.00
8000.00
10000.00
12000.00
1 11 21 31 41 51 61 71 81 91 101 111 121
Fourier appx actual
• Dow Jones Industrial
index, 6/18/2001-
12/21/2001
• just 3 DFT
coefficients give very
good approximation
1
10
100
1000
10000
100000
1000000
10000000
1 11 21 31 41 51 61 71 81 91 101 111 121
Series1
freq
Log(ampl)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 56
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD etc (data dependent)
• MDS, FastMap
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 57
SVD
• THE optimal method for dimensionality
reduction
– (under the Euclidean metric)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 58
Singular Value Decomposition
(SVD)• SVD (~LSI ~ KL ~ PCA ~ spectral
analysis...) LSI: S. Dumais; M. Berry
KL: eg, Duda+Hart
PCA: eg., Jolliffe
Details: [Press+],
[Faloutsos96]
day1
day2
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 59
SVD
• Extremely useful tool
– (also behind PageRank/google and Kleinberg‟s
algorithm for hubs and authorities)
• But may be slow: O(N * M * M) if N>M
• any approximate, faster method?
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 60
SVD shorcuts
• random projections (Johnson-Lindenstrauss
thm [Papadimitriou+ pods98])
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 61
Random projections
• pick „enough‟ random directions (will be
~orthogonal, in high-d!!)
• distances are preserved probabilistically,
within epsilon
• (also, use as a pre-processing step for SVD
[Papadimitriou+ PODS98])
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 62
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD etc (data dependent), ICA
• MDS, FastMap
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 63
Citation
• AutoSplit: Fast and Scalable Discovery of
Hidden Variables in Stream and Multimedia
Databases, Jia-Yu Pan, Hiroyuki Kitagawa,
Christos Faloutsos and Masafumi
Hamamoto
PAKDD 2004, Sydney, Australia
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 64
Motivation:
(Q1) Find patterns in data• Motion capture data (broad jumps)
Left Knee
Right Knee
Energy exerted
Take-off
Landing
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 65
PCA sometimes misses essential
features
• Best SVD axis: not always meaningful!
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 66
Motivation:
(Q1) Find patterns in data
• Human would say
– Pattern 1: along
diagonal
– Pattern 2: along
vertical axis
• How to find these
automatically? Left Knee
Right Knee
60:1
1:1
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 67
Motivation:
(Q2) Find hidden variables
Dow Jones Industrial Average
Alcoa
American
Express
Boeing
Caterpillar
Citi
Group
Find common
hidden variables,
and weights.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 68
Motivation:
(Q2) Find hidden variables
Hidden variable 1 Hidden variable 2
B1,CAT
B1,INTC B2,CAT
B2,INTC?
? ?
Caterpillar Intel
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 69
Motivation:
(Q2) Find hidden variables
“Hidden variable 1” “Hidden variable 2”
0.940.63 0.03
0.64
Caterpillar Intel
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 70
Motivation:
(Q2) Find hidden variables
“General trend” “Internet bubble”
0.940.63 0.03
0.64
Caterpillar Intel
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 80
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD (data dependent)
• MDS, FastMap
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 81
MDS / FastMap
• but, what if we have NO points to start
with?
(eg. Time-warping distance)
• A: Multi-dimensional Scaling (MDS) ;
FastMap
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 82
MDS/FastMap
01100100100O5
10100100100O4
100100011O3
100100101O2
100100110O1
O5O4O3O2O1~100
~1
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 83
MDS
Multi Dimensional
Scaling
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 84
FastMap
• Multi-dimensional scaling (MDS) can do
that, but in O(N**2) time
• FastMap [Faloutsos+95] takes O(N) time
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 85
FastMap: Application
VideoTrails [Kobla+97]
scene-cut detection (about 10% errors)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 86
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD (data dependent)
• MDS, FastMap, IsoMap etc
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li #87
Variations
• Isomap [Tenenbaum, de Silva, Langford,
2000]
• LLE (Local Linear Embedding) [Roweis,
Saul, 2000]
• MVE (Minimum Volume Embedding)
[Shaw & Jebara, 2007]
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li #88
Variations
• Isomap [Tenenbaum, de Silva, Langford,
2000]
• LLE (Local Linear Embedding) [Roweis,
Saul, 2000]
• MVE (Minimum Volume Embedding)
[Shaw & Jebara, 2007]
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 89
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD (data dependent)
• MDS, FastMap
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 90
Conclusions - Practitioner‟s
guide
Similarity search in time sequences
1) establish/choose distance (Euclidean, time-
warping,…)
2) extract features (SVD, DWT, MDS), and use
an SAM (R-tree/variant) or a Metric Tree (M-
tree)
2‟) for high intrinsic dimensionalities, consider sequential
scan (it might win…)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 91
Books
• William H. Press, Saul A. Teukolsky, William T.
Vetterling and Brian P. Flannery: Numerical Recipes in C,
Cambridge University Press, 1992, 2nd Edition. (Great
description, intuition and code for SVD)
• C. Faloutsos: Searching Multimedia Databases by Content,
Kluwer Academic Press, 1996 (introduction to SVD, and
GEMINI)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 92
References
• Agrawal, R., K.-I. Lin, et al. (Sept. 1995). Fast Similarity
Search in the Presence of Noise, Scaling and Translation in
Time-Series Databases. Proc. of VLDB, Zurich,
Switzerland.
• Babu, S. and J. Widom (2001). “Continuous Queries over
Data Streams.” SIGMOD Record 30(3): 109-120.
• Breunig, M. M., H.-P. Kriegel, et al. (2000). LOF:
Identifying Density-Based Local Outliers. SIGMOD
Conference, Dallas, TX.• Berry, Michael: http://www.cs.utk.edu/~lsi/
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 93
References
• Ciaccia, P., M. Patella, et al. (1997). M-tree: An Efficient
Access Method for Similarity Search in Metric Spaces.
VLDB.
• Foltz, P. W. and S. T. Dumais (Dec. 1992). “Personalized
Information Delivery: An Analysis of Information
Filtering Methods.” Comm. of ACM (CACM) 35(12): 51-
60.
• Guttman, A. (June 1984). R-Trees: A Dynamic Index
Structure for Spatial Searching. Proc. ACM SIGMOD,
Boston, Mass.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 94
References
• Gaede, V. and O. Guenther (1998). “Multidimensional
Access Methods.” Computing Surveys 30(2): 170-231.
• Gehrke, J. E., F. Korn, et al. (May 2001). On Computing
Correlated Aggregates Over Continual Data Streams.
ACM Sigmod, Santa Barbara, California.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li Part2.1 #95
References
• Gunopulos, D. and G. Das (2001). Time Series
Similarity Measures and Time Series Indexing.
SIGMOD Conference, Santa Barbara, CA.
• Eamonn J. Keogh, Themis Palpanas, Victor B.
Zordan, Dimitrios Gunopulos, Marc Cardle:
Indexing Large Human-Motion Databases. VLDB
2004: 780-791
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 96
References
• Hatonen, K., M. Klemettinen, et al. (1996). Knowledge
Discovery from Telecommunication Network Alarm
Databases. ICDE, New Orleans, Louisiana.
• Jolliffe, I. T. (1986). Principal Component Analysis,
Springer Verlag.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 97
References
• Keogh, E. J., K. Chakrabarti, et al. (2001). Locally
Adaptive Dimensionality Reduction for Indexing Large
Time Series Databases. SIGMOD Conference, Santa
Barbara, CA.
• Kobla, V., D. S. Doermann, et al. (Nov. 1997).
VideoTrails: Representing and Visualizing Structure in
Video Sequences. ACM Multimedia 97, Seattle, WA.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 98
References
• Oppenheim, I. J., A. Jain, et al. (March 2002). A MEMS
Ultrasonic Transducer for Resident Monitoring of Steel
Structures. SPIE Smart Structures Conference SS05, San
Diego.
• Papadimitriou, C. H., P. Raghavan, et al. (1998). Latent
Semantic Indexing: A Probabilistic Analysis. PODS,
Seattle, WA.
• Rabiner, L. and B.-H. Juang (1993). Fundamentals of
Speech Recognition, Prentice Hall.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 99
References
• Traina, C., A. Traina, et al. (October 2000). Fast feature
selection using the fractal dimension,. XV Brazilian
Symposium on Databases (SBBD), Paraiba, Brazil.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 100
References• Dennis Shasha and Yunyue Zhu High Performance
Discovery in Time Series: Techniques and Case Studies Springer 2004
• Yunyue Zhu, Dennis Shasha ``StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time'‘ VLDB, August, 2002. pp. 358-369.
• Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. The Design of an Acquisitional Query Processor for Sensor Networks. SIGMOD, June 2003, San Diego, CA.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li #101
References
• Lawrence Saul & Sam Roweis. An Introduction to Locally Linear Embedding (draft)
• Sam Roweis & Lawrence Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, v.290 no.5500 , Dec.22, 2000. pp.2323--2326.
• B. Shaw and T. Jebara. "Minimum Volume Embedding" . Artificial Intelligence and Statistics, AISTATS, March 2007.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li #102
References
• Josh Tenenbaum, Vin de Silva and John Langford. A
Global Geometric Framework for Nonlinear
dimensionality Reduction. Science 290, pp. 2319-2323,
2000.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 103
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 104
Outline
• Motivation
• Similarity Search and Indexing
• DSP (DFT, DWT)
• Linear Forecasting
• Kalman filters
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 105
Outline
• DFT
– Definition of DFT and properties
– how to read the DFT spectrum
• DWT
– Definition of DWT and properties
– how to read the DWT scalogram
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 106
Introduction - Problem#1
Goal: given a signal (eg., packets over time)
Find: patterns and/or compress
year
count
lynx caught per year
(packets per day;
automobiles per hour)-2000
0
2000
4000
6000
8000
1 14 27 40 53 66 79 92 105
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 107
What does DFT do?
A: highlights the periodicities
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 108
DFT: definition• For a sequence x0, x1, … xn-1
• the (n-point) Discrete Fourier Transform is
• X0, X1, … Xn-1 :
)/2exp(*/1
)1(
1,,0)/2exp(*/1
1
0
1
0
ntfjXnx
j
nfntfjxnX
n
t
ft
n
t
tf
inverse DFT
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 109
DFT: definition
• Good news: Available in all symbolic math
packages, eg., in „mathematica‟
x = [1,2,1,2];
X = Fourier[x];
Plot[ Abs[X] ];
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 110
DFT: Amplitude spectrum
actual mean mean+freq12
1
12
23
34
45
56
67
78
89
10
0
11
1
year
count
Freq.
Ampl.
freq=12
freq=0
)(Im)(Re 222
fffXXA Amplitude:
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 111
DFT: examples
flat
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time freq
Amplitude
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 112
DFT: examples
Low frequency sinusoid
-0.150
-0.100
-0.050
0.000
0.050
0.100
0.150
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time freq
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 113
DFT: examples
• Sinusoid - symmetry property: Xf = X*n-f
-0.150
-0.100
-0.050
0.000
0.050
0.100
0.150
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time freq
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 114
DFT: examples
• Higher freq. sinusoid
-0.080
-0.060
-0.040
-0.020
0.000
0.020
0.040
0.060
0.080
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time freq
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 115
DFT: examples
examples
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
=
+
+
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 116
DFT: examples
examples
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Freq.
Ampl.
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 117
Outline
• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 118
Outline
• Motivation
• Similarity Search and Indexing
• DSP
– DFT
• Definition of DFT and properties
• how to read the DFT spectrum
– DWT
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 119
DFT: Amplitude spectrum
actual mean mean+freq12
1
12
23
34
45
56
67
78
89
10
0
11
1
year
count
Freq.
Ampl.
freq=12
freq=0
)(Im)(Re 222
fffXXA Amplitude:
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 120
DFT: Amplitude spectrum
actual mean mean+freq12
1
12
23
34
45
56
67
78
89
10
0
11
1
year
count
Freq.
Ampl.
freq=12
freq=0
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 121
1
12
23
34
45
56
67
78
89
10
0
11
1
DFT: Amplitude spectrum
actual mean mean+freq12
year
count
Freq.
Ampl.
freq=12
freq=0
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 122
DFT: Amplitude spectrum
• excellent approximation, with only 2
frequencies!
• so what?
actual mean mean+freq12
1
12
23
34
45
56
67
78
89
10
0
11
1
Freq.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 123
DFT: Amplitude spectrum
• excellent approximation, with only 2
frequencies!
• so what?
• A1: (lossy) compression
• A2: pattern discovery 1
12
23
34
45
56
67
78
89
10
0
11
1
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 124
DFT: Amplitude spectrum
• excellent approximation, with only 2
frequencies!
• so what?
• A1: (lossy) compression
• A2: pattern discovery
actual mean mean+freq12
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 125
DFT - Conclusions
• It spots periodicities (with the „amplitude spectrum’)
• can be quickly computed (O( n log n)), thanks to the FFT algorithm.
• standard tool in signal processing (speech, image etc signals)
• (closely related to DCT and JPEG)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 126
Outline
• Motivation
• Similarity Search and Indexing
• DSP
– DFT
– DWT
• Definition of DWT and properties
• how to read the DWT scalogram
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 127
Problem #1:
Goal: given a signal (eg., #packets over time)
Find: patterns, periodicities, and/or compress
year
count lynx caught per year
(packets per day;
virus infections per month)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 128
Wavelets - DWT
• DFT is great - but, how about compressing
a spike?
value
time
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 129
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Wavelets - DWT
• DFT is great - but, how about compressing
a spike?
• A: Terrible - all DFT coefficients needed!
0
0.2
0.4
0.6
0.8
1
1.2
1 3 5 7 9 11 13 15
Freq.
Ampl.value
time
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 130
Wavelets - DWT
• DFT is great - but, how about compressing
a spike?
• A: Terrible - all DFT coefficients needed!
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
value
time
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 131
Wavelets - DWT
• Similarly, DFT suffers on short-duration
waves (eg., baritone, silence, soprano)
time
value
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 132
Wavelets - DWT
• Solution#1: Short window Fourier
transform (SWFT)
• But: how short should be the window?
time
freq
time
value
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 133
Wavelets - DWT
• Answer: multiple window sizes! -> DWT
time
freq
Time
domain DFT SWFT DWT
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 134
Haar Wavelets
• subtract sum of left half from right half
• repeat recursively for quarters, eight-ths, ...
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 135
Wavelets - construction
x0 x1 x2 x3 x4 x5 x6 x7
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 136
Wavelets - construction
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......level 1
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 137
Wavelets - construction
d2,0
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......
s2,0level 2
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 138
Wavelets - construction
d2,0
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......
s2,0
etc ...
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 139
Wavelets - construction
d2,0
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......
s2,0
Q: map each coefficient
on the time-freq. plane
t
f
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 140
Wavelets - construction
d2,0
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......
s2,0
Q: map each coefficient
on the time-freq. plane
t
f
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 141
Haar wavelets - code
#!/usr/bin/perl5
# expects a file with numbers
# and prints the dwt transform
# The number of time-ticks should be a power of 2
# USAGE
# haar.pl <fname>
my @vals=();
my @smooth; # the smooth component of the signal
my @diff; # the high-freq. component
# collect the values into the array @val
while(<>){
@vals = ( @vals , split );
}
my $len = scalar(@vals);
my $half = int($len/2);
while($half >= 1 ){
for(my $i=0; $i< $half; $i++){
$diff [$i] = ($vals[2*$i] - $vals[2*$i + 1] )/ sqrt(2);
print "\t", $diff[$i];
$smooth [$i] = ($vals[2*$i] + $vals[2*$i + 1] )/ sqrt(2);
}
print "\n";
@vals = @smooth;
$half = int($half/2);
}
print "\t", $vals[0], "\n" ; # the final, smooth component
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 142
Wavelets - construction
Observation1:
„+‟ can be some weighted addition
„-‟ is the corresponding weighted difference
(„Quadrature mirror filters‟)
Observation2: unlike DFT/DCT,
there are *many* wavelet bases: Haar, Daubechies-
4, Daubechies-6, Coifman, Morlet, Gabor, ...
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 143
Wavelets - how do they look
like?
• E.g., Daubechies-4
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 144
Wavelets - how do they look
like?
• E.g., Daubechies-4
?
?
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 145
Wavelets - how do they look
like?
• E.g., Daubechies-4
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 146
Outline
• Motivation
• Similarity Search and Indexing
• DSP
– DFT
– DWT
• Definition of DWT and properties
• how to read the DWT scalogram
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 147
Wavelets - Drill#1:
t
f
• Q: baritone/silence/soprano - DWT?
time
value
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 148
Wavelets - Drill#1:
t
f
• Q: baritone/silence/soprano - DWT?
time
value
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 149
Wavelets - Drill#2:
• Q: spike - DWT?
t
f
1 2 3 4 5 6 7 8
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 150
Wavelets - Drill#2:
t
f
• Q: spike - DWT?
1 2 3 4 5 6 7 8
0.00 0.00 0.71 0.00
0.00 0.50
-0.35
0.35
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 151
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 152
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 153
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 154
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 155
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 156
Wavelets - Drill#3:
• Q: DFT?
t
f
t
f
DWT DFT
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 157
Advantages of Wavelets
• Better compression (better RMSE with same
number of coefficients - used in JPEG-2000)
• fast to compute (usually: O(n)!)
• very good for „spikes‟
• mammalian eye and ear: Gabor wavelets
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 158
Overall Conclusions
• DFT, DCT spot periodicities
• DWT : multi-resolution - matches
processing of mammalian ear/eye better
• All three: powerful tools for compression,
pattern detection in real signals
• All three: included in math packages
– (matlab, „R‟, mathematica, … - often in
spreadsheets!)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 159
Overall Conclusions
• DWT : very suitable for self-similar traffic
• DWT: used for summarization of streams [Gilbert+01], db histograms etc
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 160
Resources - software and urls
• http://www.dsptutor.freeuk.com/jsanalyser/
FFTSpectrumAnalyser.html : Nice java
applets for FFT
• http://www.relisoft.com/freeware/freq.html
voice frequency analyzer (needs
microphone)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 161
Resources: software and urls
• xwpl: open source wavelet package from
Yale, with excellent GUI
• http://monet.me.ic.ac.uk/people/gavin/java
/waveletDemos.html : wavelets and
scalograms
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 162
Books
• William H. Press, Saul A. Teukolsky, William T.
Vetterling and Brian P. Flannery: Numerical Recipes in C,
Cambridge University Press, 1992, 2nd Edition. (Great
description, intuition and code for DFT, DWT)
• C. Faloutsos: Searching Multimedia Databases by Content,
Kluwer Academic Press, 1996 (introduction to DFT,
DWT)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 163
Additional Reading
• [Gilbert+01] Anna C. Gilbert, Yannis Kotidis and S.
Muthukrishnan and Martin Strauss, Surfing Wavelets on
Streams: One-Pass Summaries for Approximate Aggregate
Queries, VLDB 2001
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 164
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 165
Outline
• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Kalman filters
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 166
Forecasting
"Prediction is very difficult, especially about
the future." - Nils Bohr
http://www.hfac.uh.edu/MediaFutures/thoughts.html
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 167
Outline
• Motivation
• ...
• Linear Forecasting
– Auto-regression: Least Squares; RLS
– Co-evolving time sequences
– Examples
– Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 168
Problem#2: Forecast
• Example: give xt-1, xt-2, …, forecast xt
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sen
t
??
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 169
Forecasting: Preprocessing
MANUALLY:
remove trends spot periodicities
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
2
2.5
3
3.5
1 3 5 7 9 11 13
timetime
7 days
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 170
Problem#2: Forecast
• Solution: try to express
xt
as a linear function of the past: xt-2, xt-2, …,
(up to a window of w)
Formally:
0102030405060708090
1 3 5 7 9 11Time Tick
??noisexaxax wtwtt 11
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 171
(Problem: Back-cast; interpolate)• Solution - interpolate: try to express
xt
as a linear function of the past AND the future:
xt+1, xt+2, … xt+wfuture; xt-1, … xt-wpast
(up to windows of wpast , wfuture)
• EXACTLY the same algo‟s
0102030405060708090
1 3 5 7 9 11Time Tick
??
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 172
Linear Regression: idea
40
45
50
55
60
65
70
75
80
85
15 25 35 45
Body weight
patient weight height
1 27 43
2 43 54
3 54 72
……
…
N 25 ??
• express what we don‟t know (= „dependent variable‟)
• as a linear function of what we know (= „indep. variable(s)‟)
Body height
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 173
Linear Auto Regression:
Time Packets
Sent (t-1)
Packets
Sent(t)
1 - 43
2 43 54
3 54 72
……
…
N 25 ??
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 174
Linear Auto Regression:
40
45
50
55
60
65
70
75
80
85
15 25 35 45
Number of packets sent (t-1)N
um
ber
of
pa
cket
s se
nt
(t)
Time Packets
Sent (t-1)
Packets
Sent(t)
1 - 43
2 43 54
3 54 72
……
…
N 25 ??
• lag w=1
• Dependent variable = # of packets sent (S [t])
• Independent variable = # of packets sent (S[t-1])
„lag-plot‟
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 175
Outline
• Motivation
• ...
• Linear Forecasting
– Auto-regression: Least Squares; RLS
– Co-evolving time sequences
– Examples
– Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 176
More details:
• Q1: Can it work with window w>1?
• A1: YES!
xt-2
xt
xt-1
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 177
More details:
• Q1: Can it work with window w>1?
• A1: YES! (we‟ll fit a hyper-plane, then!)
xt-2
xt
xt-1
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 178
More details:
• Q1: Can it work with window w>1?
• A1: YES! (we‟ll fit a hyper-plane, then!)
xt-2
xt-1
xt
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 179
More details:
• Q1: Can it work with window w>1?
• A1: YES! The problem becomes:
X[N w] a[w 1] = y[N 1]
• OVER-CONSTRAINED– a is the vector of the regression coefficients
– X has the N values of the w indep. variables
– y has the N values of the dependent variable
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 180
More details:
• X[N w] a[w 1] = y[N 1]
N
w
NwNN
w
w
y
y
y
a
a
a
XXX
XXX
XXX
2
1
2
1
21
22221
11211
,,,
,,,
,,,
Ind-var1 Ind-var-w
time
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 181
More details:
• X[N w] a[w 1] = y[N 1]
N
w
NwNN
w
w
y
y
y
a
a
a
XXX
XXX
XXX
2
1
2
1
21
22221
11211
,,,
,,,
,,,
Ind-var1 Ind-var-w
time
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 182
More details
• Q2: How to estimate a1, a2, … aw = a?
• A2: with Least Squares fit
• (Moore-Penrose pseudo-inverse)
• a is the vector that minimizes the RMSE
from y
a = ( XT X )-1 (XT y)
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 183
Even more details
• Q3: Can we estimate a incrementally?
• A3: Yes, with the brilliant, classic method
of „Recursive Least Squares‟ (RLS) (see,
e.g., [Yi+00], for details) - pictorially:
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 184
Even more details
• Given:
Independent Variable
Dep
enden
t V
aria
ble
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 185
Even more details
Independent Variable
Dep
enden
t V
aria
ble
.
new point
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 186
Even more details
Independent Variable
Dep
enden
t V
aria
ble
RLS: quickly compute new best fit
new point
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 187
Even more details
• Straightforward Least
Squares
– Needs huge matrix
(growing in size)
O(N×w)
– Costly matrix operation
O(N×w2)
• Recursive LS
– Need much smaller, fixed
size matrix
O(w×w)
– Fast, incremental
computation
O(1×w2)
N = 106, w = 1-100
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 188
Even more details
• Q4: can we „forget‟ the older samples?
• A4: Yes - RLS can easily handle that
[Yi+00]:
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 189
Adaptability - „forgetting‟
Independent Variable
eg., #packets sent
Dep
enden
t V
aria
ble
eg.,
#by
tes
sent
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 202
Outline
• Motivation
• ...
• Linear Forecasting
– Auto-regression: Least Squares; RLS
– Co-evolving time sequences
– Examples
– Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 203
Co-Evolving Time Sequences
• Given: A set of correlated time sequences
• Forecast „Repeated(t)‟
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sent
lost
repeated
??
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 204
Solution:
Q: what should we do?
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 205
Solution:
Least Squares, with
• Dep. Variable: Repeated(t)
• Indep. Variables: Sent(t-1) … Sent(t-w);
Lost(t-1) …Lost(t-w); Repeated(t-1), ...
• (named: „MUSCLES‟ [Yi+00])
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 206
Time Series Analysis - Outline
• Auto-regression
• Least Squares; recursive least squares
• Co-evolving time sequences
• Examples
• Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 207
Conclusions - Practitioner‟s
guide
• AR(IMA) methodology: prevailing method
for linear forecasting
• Brilliant method of Recursive Least Squares
for fast, incremental estimation.
• See [Box-Jenkins]
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 208
Resources: software and urls
• MUSCLES: Prof. Byoung-Kee Yi:
http://www.postech.ac.kr/~bkyi/
• free-ware: „R‟ for stat. analysis
(clone of Splus)
http://cran.r-project.org/
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 209
Books
• George E.P. Box and Gwilym M. Jenkins and Gregory C.
Reinsel, Time Series Analysis: Forecasting and Control,
Prentice Hall, 1994 (the classic book on ARIMA, 3rd ed.)
• Brockwell, P. J. and R. A. Davis (1987). Time Series:
Theory and Methods. New York, Springer Verlag.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 210
Additional Reading
• [Papadimitriou+ vldb2003] Spiros Papadimitriou, Anthony
Brockwell and Christos Faloutsos Adaptive, Hands-Off
Stream Mining VLDB 2003, Berlin, Germany, Sept. 2003
• [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for
Co-Evolving Time Sequences, ICDE 2000. (Describes
MUSCLES and Recursive Least Squares)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 211
Next: Kalman filters
BREAK!
CMU SCS
Indexing and Mining Time
Sequences
Part 4
Kalman Filters
Christos Faloutsos
Lei Li
CMU
KDD 2010 1Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 2
CMU SCS
Intuition
• Tracking moving objects, estimate velocity
and acceleration on the fly
3KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
from FIFA 2010RoboCup 2010
CMU SCS
Linear Dynamical System
• Known parameters
– Original Kalman Filters [Kalman 1960, Rauch
1965]
• Unknown parameters
– Parameter estimation through EM algorithm
[Shumway et al 1982, Ghahramani 1996]
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 4
CMU SCS
Kalman Filters
Given observations of the soccer ball position
t=1..T, “Model parameters”
Goal: two types of prediction
Kalman filtering: Estimate the true position,
velocity & acceleration based on the
previous observations
Kalman smoothing: Estimate for every time
tick, based on all observationsKDD 2010 Copyright: C. Faloutsos & L. Li, 2010 5
?
?
CMU SCS
Kalman Filters (intuition)
t=1, soccer with initial pos, vel and acc.
To estimate the future
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 6
v1
a1
p1
past observations
CMU SCS
Kalman Filters (intuition)
t=2, according to Newton’s law, it should be...
7KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
2p~
2v~2a~
CMU SCS
Kalman Filters (intuition)
t=2, according to Newton’s law, it should be...
however, imperfect soccer/kick movement…
8KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
v2
a2
p2
CMU SCS
Kalman Filters (intuition)
Now take a photo, due to imperfect camera…
9KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
v2
a2
p2
CMU SCS
Kalman Filters (intuition)
What is the best estimate for next time tick?
10KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
v2
a2
p2
CMU SCS
Kalman Filters (intuition)
What is the best estimate for next time tick?
11KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
2p
2v2a
CMU SCS
Some math notationname
A transition matrix
C transmission/ projection/
output matrix
Q transition covariance
R transmission/ projection/
output covariance
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 12
v1
a1
p1
‘Newton’s dynamics’: Transition matrix A
2p~
2v~2a~
v1
a1
p1
2p
2v2a
CMU SCS
Example
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 13
observation
hidden states
(details)
z1=(p1, v1, a1)T
x1=(observed1)
A= 1,1,1/2
0, 1, 1
0, 0, 1
v1
a1
p1
2p
2v2a
C=(1, 0, 0)
transition matrix
output matrix
transition covariance Q
output covariance R
CMU SCS
Example
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 14
observation
hidden states
(details)
z1=(p1, v1, a1)T
x1=(observed1)
A= 1,1,1/2
0, 1, 1
0, 0, 1
C=(1, 0, 0)
transition matrix
output matrix
p2=p1+ v1*∆t + 0.5*a1*∆t2
v2=v1+a1*∆t
a2=a1
v1
a1
p1
CMU SCS
Kalman Filtering (intuition)
Step1, forecast next time tick before observation
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 15
v1
a1
p1
‘Newton’s dynamics’:
Transition matrix A
2p~
2v~2a~
CMU SCS
Kalman Filtering (intuition)
Step 2: adjust estimation after observation
16KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
2p
2v2a
CMU SCS
Example: Kalman filtering
(forward)
17
Time*
observed
Position
*
** *
*
Given:
a sequence of observations, Model parameters (A, C …)
Goal: remove noise and forecast real position
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Example: Kalman filtering
(forward)
18
Time*
observed
estimated
Position
[R. E. Kalman. A new approach to linear filtering and prediction problems. 1960]
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
t=1
CMU SCS
Example: Kalman filtering
19
Time*
observed
estimated
Position
*
QAVAP
RCPCCPK
PKIV
zACxKzA
T
nn
T
n
T
nn
nnn
nnnnn
11
1
11
1
11
ˆ
)(
)(ˆ
)ˆ(ˆz
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
t=2
Intuition: #2 may
be close to #1
Using the “Newton
Dynamics”
CMU SCS
20
Time*
observed
Position
** *
*
Example: Kalman filtering
*estimated
QAVAP
RCPCCPK
PKIV
zACxKzA
T
nn
T
n
T
nn
nnn
nnnnn
11
1
11
1
11
ˆ
)(
)(ˆ
)ˆ(ˆz
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Kalman Filters
Given observations of the soccer position
t=1..T, and model parameters (A, C …)
Goal: two types of prediction
Kalman filtering: Estimate the true position,
velocity & acceleration based on the
previous observations
Kalman smoothing: Estimate for every time
tick, based on all observationsKDD 2010 Copyright: C. Faloutsos & L. Li, 2010 21
?
?
CMU SCS
Kalman Smoothing
Given: all observation x1, …, xn
Estimate: the hidden state for every time tick
zt (t=1..n)
Difference from Kalman filtering:
bring future observation back in history
estimate
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 22
??
CMU SCS
23
Time
Forward
*
observed
Position
** *
*
Recap: Kalman filtering
*estimated
QAVAP
RCPCCPK
PKIV
zACxKzA
T
nn
T
n
T
nn
nnn
nnnnn
11
1
11
1
11
ˆ
)(
)(ˆ
)ˆ(ˆz
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Example: Kalman Smoothing
24
Backward
Time*
observed
Position
** *
*
**
11
1
1
ˆ
)ˆ
(ˆˆ
)ˆˆ(ˆz
n
T
nn
T
nnnnnn
nnnnn
PAVJ
JPVJVV
zAzJz
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Rauch et al, Maximum likelihood estimates of linear dynamic systems. 1965]
CMU SCS
Example: Kalman Smoothing
25
Backward
Time*
observed
Position
** *
*
*
estimated
*
**
*
*
1
1
1
ˆ
)ˆ
(ˆˆ
)ˆˆ(ˆz
n
T
nn
T
nnnnnn
nnnnn
PAVJ
JPVJVV
zAzJz
2
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Example: Kalman Smoothing
26
Backward
Time*
observed
Position
** *
*
*
estimated
*
**
*
*
Reconstructed
signal after
smoothing
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Outline
• Intuition, example, and definition
– Original Kalman
– Kalman filters with parameter estimation
• Extensions
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 27
CMU SCS
What if not know the model
parameters?
• E.g. Datacenter sensor temperatures
• no longer “Newton dynamics”
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 28
v1
a1
p1
Transition matrix A =
output matrix C =
2p
2v2a
CMU SCS
Graphical Model Representation
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 29
z1 = µ0 + ω0
zn+1= A∙zn + ωn
xn = C∙zn + εn
Model parameters:
θ={µ0, Q0, A, Q, C, R}
observation
hidden states
(details)
Z1 Z2 Z3 Z4
X1 X2 X3X4
N(A∙z1,Q)
N(µ0,Q0)
N(C∙z3,R)N(C∙z1,R) N(C∙z2,R) N(C∙z4,R) …
N(A∙z2,Q) N(A∙z3,Q) N(A∙z4,Q)
Gaussian
noise
CMU SCS
Linear Dynamical Systems:
parametersname meaning & example
µ0 initial state for hidden
variable
e.g. initial position, velocity &
acceleration
A transition matrix how the states move forward, e.g.
soccer flying in the air
C transmission/ projection/
output matrix
hidden state observation, e.g.
camera taking picture of the soccer
Q0 Initial covariance
Q transition covariance how precision is the soccer motion
R transmission/ projection
covariance
i.e. observation noise; e.g. how
accurate is the cameraKDD 2010 Copyright: C. Faloutsos & L. Li, 2010 30
CMU SCS
Estimating parameters of LDS• Given: a sequence of observations (e.g. car
positions)
• Find: best-fit parameters
• Basic principle: maximum likelihood
• Through Expectation-Maximization Alg.
31
Z1 Z2 Z3 Z4
X1 X2 X3X4
N(A∙z1,Q)
N(µ0,Q0)
N(C∙z3,R)N(C∙z1,R) N(C∙z2,R) N(C∙z4,R)…
N(A∙z2,Q) N(A∙z3,Q) N(A∙z4,Q)
CMU SCS
Learning LDS: EM alg.
• E-step: Kalman filtering-smoothing
– Estimate the hidden variables based on
observation and current parameters
• M-step:
– Update parameters (A, C…) to
maximize
the log-likelihood.
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 32
Z1
Z2
Z3
Z4
X1
X2
X3
X4
N(A∙z1,Q)
N(µ0,Q0)
N(C∙z3,R)N(C∙z1,R) N(C∙z2,R) N(C∙z4,R)…
N(A∙z2,Q) N(A∙z3,Q) N(A∙z4,Q)
[Shumway et al 1982, Ghahramani 1996]
CMU SCS
EM alg. Intuition
33
E-step: compute hidden states using Kalman
filtering-smoothing (exactly as before)
Time*
observed
Position
** *
*
*
estimated
*
**
*
*
Reconstructed
signal after
smoothing
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
EM alg.
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 34
M-step: Maximizing the log-likelihood
by solving 0A
L
0C
L
...
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
– Handling missing values
– Switching LDS
– Particle filters
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 35
CMU SCS
Motivation Examples
• Motion Capture:– Markers on human actors
– Cameras used to track the 3D positions
– Duration: 100-500
– 93 dimensional body-local coordinates after preprocessing (31-bones)
• Sensor data missing due to:– Low battery
– RF error
36
From mocap.cs.cmu.edu
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Time
sensor 1
sensor 2
…
sensorm
blackout
Illustration of data
37KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
DynaMMo: Intuition
38
Position of
Left hand
marker
Position of
right hand
marker
missing
Recover using
Correlation
among multiple
sequences
[Lei Li et al DynaMMo. KDD 2009]KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
DynaMMo: Intuition
39
missing
Recover using
Dynamics
temporal moving
pattern
Position of
Left hand
marker
Position of
right hand
marker
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Lei Li et al DynaMMo. KDD 2009]
CMU SCS
LDS with Missing Value
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 40
observation
hidden states Z1 Z2 Z3 Z4
X1 X2 X3X4
N(A∙z1,Q)
N(µ0,Q0)
N(C∙z3,R)N(C∙z1,R) N(C∙z2,R) N(C∙z4,R) …
N(A∙z2,Q) N(A∙z3,Q) N(A∙z4,Q)
partially
observed
CMU SCS
DynaMMo Illustration:
estimate all colored elements
41x1 x2 x3
× × ×
×× ×
C C C
Details in [Li+2009]
z1 z2 z3
A A A
CMU SCS
DynaMMo Illustration: step 1
estimate hidden variables
42x1 x2 x3
× × ×
×× ×
C C C
Details in [Li+2009]
z1 z2 z3
A A A
CMU SCS
DynaMMo Illustration: step 2
estimate missing values
43x1 x2 x3
× × ×
×× ×
Details in [Li+2009]
z1 z2 z3
C C C
A A A
CMU SCS
DynaMMo Illustration: step 3
update model parameters
44x1 x2 x3
× × ×
×× ×
C C C
Details in [Li+2009]
z1 z2 z3
A A A
CMU SCS
DynaMMo Intuition, forward-
backward algorithm
• How to recover the missing values?
45x1 x2 x3
CMU SCS
DynaMMo: How to Recover?
46
×
x1
hidden variables
e.g. velocity,
acceleration
projection
matrix C
z1
C
CMU SCS
DynaMMo: How to Recover?
47
×
×
×
x1 x2
Transition
matrix
projection
matrix
z1 z2
A
C C
CMU SCS
DynaMMo: How to Recover?
48
×
x1 x2
× ×
×z1 z2 z3
C C
x3
A A
CMU SCS
DynaMMo: How to Recover?
49x1 x2 x3
× × ×
××
C C C
z1 z2 z3
A A
CMU SCS
DynaMMo: How to Recover?
50x1 x2 x3
× × ×
×× ×
C C C
z1 z2 z3
similarly backward pass
A A A
CMU SCS
DynaMMo Learning
51
Guess Initial
Estimate Hidden
Recover Missing
Update Model
Fix X,
Estimate P(Z|X;):E(zn|X;), E(znz’n|X;)E(znz’n+1|X;)
fix Z,
estimate
E(X_missing|Z;)Using E(zn|X;),
Fix both X and Z,
estimate new model
parameters
argmax E[log(X,Z;)]
Random
Guess model
parameters
1
2
3
0
(details)
CMU SCS
Result – Better Missing Value Recovery
52
Reconstruction
error
Average missing length
Ideal
Proposed DynaMMo
MSVD
[Srebro’03]Linear
Interpolation
Spline
Dataset:
CMU Mocap #16
mocap.cs.cmu.edumore results in [Li+2009]
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
– Handling missing values
– Switching LDS
– Particle filters
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 53
CMU SCS
Switching LDS: Intuition
• Tracking human motion
– Start from slow walking
– Transition to running for a while
– Gradually stop
• Each part corresponds to a different
state & dynamics
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 54
CMU SCS
Switching LDS
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 55
S={running,
walking}
Z2,1 Z2,2 Z2,3 Z2,4
X1 X2 X3X4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
CMU SCS
Switching LDS: example
S=1, walking
A=A1, C=C1
S=2, running
A=A2 , C=C2
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 56
Z2,1 Z2,2 Z2,3 Z2,4
X1 X2 X3X4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
CMU SCS
Switching LDS in Penalty Kick
Before touching ball
S=1, running towards left
A=A1, C=C1
After hitting ball
S=2, kicking towards right
A=A2 , C=C2
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 57
Z2,1 Z2,2 Z2,3 Z2,4
X1 X2 X3X4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
From FIFA 2010
CMU SCS
Estimating Parameters for SLDS
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 58
Approximate inference: Variational EM
[Ghahramani&Hinton. Variational learning for switching state-space
models.2000]
Z2,1 Z2,2 Z2,3 Z2,4
X1 X2 X3X4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
Z2,1 Z2,2 Z2,3 Z2,4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
E-step
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
– Handling missing values
– Switching LDS
– Particle filters
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 59
CMU SCS
Particle Filters
• What if non-Gaussian noise?
• Inference using Markov chain Monte Carlo
sampling
60[Gordon et al, Nonlinear/non-gaussian bayesian state
estimation. 1993]
v1
a1
p1
v2
a2
p2
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
• Kalman filters at work
– Segmentation & Compression
– Parallel learning on Multi-core
– Motion Stitching
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 61
CMU SCS
How to Segment
• Segment by threshold on reconstruction error
62
original data
reconstruction
error
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Lei Li et al DynaMMo. KDD 2009]
CMU SCS
Results – Segmentation
• Find the transition during “running” to
“stop”.
63
left hip
left femur
reconstruction
error
KDD 2010
CMU SCS
Results – Segmentation• Find the transition during “running” to
“stop”.
64
left hip
left femur
reconstruction
error
run stopslow
down
KDD 2010
CMU SCS
How to Compress (DynaMMo)
65
Original data w/ missing values
hidden variables
keep only a portion
(optimal samples)
C
DynaMMo
A
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
• Kalman filters at work
– Segmentation & Compression
– Parallel learning on Multi-core
– Motion Stitching
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 66
CMU SCS
Challenge illustration
Expectation-Maximization Alg.
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Timeline for E-step (forward-backward) in learning LDS
1 2 3 4 5
67
EM can
only use
single CPU
due to data
dependency
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Parallel learning by
Cut-And-Stitch Method
Step 1
Step 2
Step 3
Step 4
1 2 3 4 5
68
Goal:
with 2 CPUs
[Li et al 2008b]
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Li et al Cut-And-Stitch. KDD 2008]
CMU SCS
It works!
69
sp
eed
up
# of processors
ideal
Proposed
Cut-And-Stitch
EM algorithm
Dataset:
58 motion sequences
CMU Mocap #16
mocap.cs.cmu.edu,
tested on NCSA super
computer,
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
• Kalman filters at work
– Segmentation & Compression
– Parallel learning on Multi-core
– Motion Stitching
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 70
CMU SCS
Motion Stitching
A Database Approach• Select best
stitchable
segments from a
set of basic motion
pieces and
generate new
natural motions
71KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Problem Definition
• Given two motion-capture sequences that are to be
stitched together, how can we assess the goodness
of the stitching?
72
12
3
Which stitching looks best?
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Proposed Method:
Laziness Score [Li+2008a]
• Conjecture: less human effort more natural
• Proposed: use Kalman filters to estimate position,
velocity, acceleration Compute effort/ energy
73KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Which continues to?
Green or Blue?
74
straight moving U-Turn
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
CMU SCS
Result – Laziness-score prefers
straightforward moving
75
straight moving U-Turn
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Li et al Laziness Score. Eurographics 2008]
CMU SCS
Conclusion• Intuition, example, and definition
– Original kalman filter (known parameters)• Kalman filtering
• Kalman smoothing
– Kalman filters with parameter estimation (EM)
• Extensions– Handling missing values
– Switching linear dynamical
– Particle filters (MCMC sampling)
• Kalman filters at work– Segmentation & compression
– Parallel learning
– Motion stitchingKDD 2010 Copyright: C. Faloutsos & L. Li, 2010 76
CMU SCS
References
• Zoubin Ghahramani and Geoffrey E. Hinton. Parameter estimation for linear dynamical systems. 1996
• Zoubin Ghahramani and Geoffrey E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000.
• N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-gaussian bayesian state estimation. IEE Proceedings F Radar and Signal Processing, 140(2):107–113, 1993.
• R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using the em algorithm. Journal of Time Series Analysis, 3:253–264, 1982.
• R. E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME šC Journal of Basic Engineering, (82 (Series D)):35–45, 1960.
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 77
CMU SCS
References (cont’)
• H. Rauch, F Tung, and C T Striebel. Maximum likelihood estimates of linear dynamic systems. AIAA Journal, pages 1445–1450, 1965.
• Lei Li, Wenjie Fu, Fan Guo, Todd C Mowry, and Christos Faloutsos. Cut-and-stitch: efficient parallel learning of linear dynamical systems on smps. In Proceedings of the 14th ACM SIGKDD, pages 471–479, 2008.
• Lei Li, James McCann, Christos Faloutsos, and Nancy Pollard. Laziness is a virtue: Motion stitching using effort minimization. In Short Papers Proceedings of EUROGRAPHICS, 2008.
• Lei Li, James McCann, Nancy S Pollard, and Christos Faloutsos. DynaMMo: mining and summarization of coevolving sequences with missing values. In Proceedings of the 15th ACM SIGKDD,, pages 507–516, 2009.
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 78
CMU SCS
Software
• DynaMMo code (matlab) for missing value,
compression & segmentation.
• Parallel learning (in C) for LDS
• http://www.cs.cmu.edu/~leili/
• http://www.cs.cmu.edu/~leili/pubs/dynamm
o.2.1.2.zip
• http://www.cs.cmu.edu/~leili/paralearn/para
learn.0.1.zip (running on gcc 4.2.0 above)
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 79
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 212
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 213
Outline• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Kalman filters
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 214
Outline
• Motivation
• ...
• Linear Forecasting
• Bursty traffic - fractals and multifractals
– Problem
– Main idea (80/20, Hurst exponent)
– Results
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 215
Recall: Problem #1:
Goal: given a signal (eg., #bytes over time)
Find: patterns, periodicities, and/or compress
time
#bytes Bytes per 30‟
(packets per day;
earthquakes per year)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 216
Problem #1
• model bursty traffic
• generate realistic traces
• (Poisson does not work)
time
# bytes
Poisson
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 217
Motivation
• predict queue length distributions (e.g., to
give probabilistic guarantees)
• “learn” traffic, for buffering, prefetching,
„active disks‟, web servers
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 218
Q: any „pattern‟?
time
# bytes• Not Poisson
• spike; silence; more
spikes; more silence…
• any rules?
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 219
solution: self-similarity
# bytes
time time
# bytes
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 220
solution: self-similarity
# bytes
time time
# bytes
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 221
solution: self-similarity
# bytes
time time
# bytes
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 222
solution: self-similarity
# bytes
time time
# bytes
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 223
But:
• Q1: How to generate realistic traces;
extrapolate; give guarantees?
• Q2: How to estimate the model parameters?
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 224
Outline
• Motivation
• ...
• Linear Forecasting
• Bursty traffic - fractals and multifractals
– Problem
– Main idea (80/20, Hurst exponent)
– Results
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 225
Approach
• Q1: How to generate a sequence, that is
– bursty
– self-similar
– and has similar queue length distributions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 226
Approach
• A: „binomial multifractal‟ [Wang+02]
• ~ 80-20 „law‟:
– 80% of bytes/queries etc on first half
– repeat recursively
• b: bias factor (eg., 80%)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 227
binary multifractals
20 80
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 228
binary multifractals
20 80
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 229
Parameter estimation
• Q2: How to estimate the bias factor b?
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 230
Parameter estimation
• Q2: How to estimate the bias factor b?
• A: MANY ways [Crovella+96]
– Hurst exponent
– variance plot
– even DFT amplitude spectrum! („periodogram‟)
– More robust: „entropy plot‟ [Wang+02]
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 231
Entropy plot
• Rationale:
– burstiness: inverse of uniformity
– entropy measures uniformity of a distribution
– find entropy at several granularities, to see
whether/how our distribution is close to
uniform.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 232
Entropy plot
• Entropy E(n) after n
levels of splits
• n=1: E(1)= - p1 log2(p1)-
p2 log2(p2)
p1 p2
% of bytes
here
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 233
Entropy plot
• Entropy E(n) after n
levels of splits
• n=1: E(1)= - p1 log(p1)-
p2 log(p2)
• n=2: E(2) = - Si p2,i *
log2 (p2,i)
p2,1 p2,2 p2,3 p2,4
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 234
Real traffic
• Has linear entropy plot
(-> self-similar)
# of levels (n)
Entropy
E(n)
0.73
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 235
Observation - intuition:
intuition: slope =
intrinsic dimensionality =
info-bits per coordinate-bit
– unif. Dataset: slope =1
– multi-point: slope = 0
# of levels (n)
Entropy
E(n)
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 236
Entropy plot - Intuition
• Slope ~ intrinsic dimensionality (in fact,
„Information fractal dimension‟)
• = info bit per coordinate bit - eg
Dim = 1
Pick a point;
reveal its coordinate bit-by-bit -
how much info is each bit worth to me?
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 237
Entropy plot
• Slope ~ intrinsic dimensionality (in fact,
„Information fractal dimension‟)
• = info bit per coordinate bit - eg
Dim = 1
Is MSB 0?
„info‟ value = E(1): 1 bit
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 238
Entropy plot
• Slope ~ intrinsic dimensionality (in fact,
„Information fractal dimension‟)
• = info bit per coordinate bit - eg
Dim = 1
Is MSB 0?
Is next MSB =0?
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 239
Entropy plot
• Slope ~ intrinsic dimensionality (in fact,
„Information fractal dimension‟)
• = info bit per coordinate bit - eg
Dim = 1
Is MSB 0?
Is next MSB =0?
Info value =1 bit
= E(2) - E(1) =
slope!
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 240
Entropy plot
• Repeat, for all points at same position:
Dim=0
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 241
Entropy plot
• Repeat, for all points at same position:
• we need 0 bits of info, to determine position
• -> slope = 0 = intrinsic dimensionality
Dim=0
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 242
Entropy plot
• Real (and 80-20) datasets can be in-
between: bursts, gaps, smaller bursts,
smaller gaps, at every scale
Dim = 1
Dim=0
0<Dim<1
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 243
(Fractals)
• What set of points could have behavior
between point and line?
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 244
Cantor dust
• Eliminate the middle third
• Recursively!
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 245
Cantor dust
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 246
Cantor dust
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 247
Cantor dust
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 248
Cantor dust
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 249
Dimensionality?
(no length; infinite # points!)
Answer: log2 / log3 = 0.6
Cantor dust
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 250
Some more entropy plots:
• Poisson vs real
Poisson: slope = ~1 -> uniformly distributed
1 0.73
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 251
B-model
• b-model traffic gives perfectly
linear plot
• Lemma: its slope is
slope = -b log2b - (1-b) log2 (1-b)
• Fitting: do entropy plot; get
slope; solve for b
E(n)
n
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 252
Outline
• Motivation
• ...
• Linear Forecasting
• Bursty traffic - fractals and multifractals
– Problem
– Main idea (80/20, Hurst exponent)
– Experiments - Results
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 253
Experimental setup
• Disk traces (from HP [Wilkes 93])
• web traces from LBL
http://repository.cs.vt.edu/
lbl-conn-7.tar.Z
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 254
Model validation
• Linear entropy plots
Bias factors b: 0.6-0.8
smallest b / smoothest: nntp traffic
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 255
Web traffic - results
• LBL, NCDF of queue lengths (log-log scales)
(queue length l)
Prob( >l)
How to give guarantees?
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 256
Web traffic - results
• LBL, NCDF of queue lengths (log-log scales)
(queue length l)
Prob( >l)20% of the requests
will see
queue lengths <100
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 257
Conclusions
• Multifractals (80/20, „b-model‟,
Multiplicative Wavelet Model (MWM)) for
analysis and synthesis of bursty traffic
• can give (probabilistic) guarantees
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 258
Books
• Fractals: Manfred Schroeder: Fractals, Chaos, Power
Laws: Minutes from an Infinite Paradise W.H. Freeman
and Company, 1991 (Probably the BEST book on
fractals!)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 259
Further reading:
• Crovella, M. and A. Bestavros (1996). Self-Similarity in
World Wide Web Traffic, Evidence and Possible Causes.
Sigmetrics.
• [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger,
D.V. Wilson, On the Self-Similar Nature of Ethernet
Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15,
Feb. 1994.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 260
Further reading
• [Riedi+99] R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R.
G. Baraniuk, A Multifractal Wavelet Model with
Application to Network Traffic, IEEE Special Issue on
Information Theory, 45. (April 1999), 992-1018.
• [Wang+02] Mengzhi Wang, Tara Madhyastha, Ngai Hang
Chang, Spiros Papadimitriou and Christos Faloutsos, Data
Mining Meets Performance Evaluation: Fast Algorithms
for Modeling Bursty Traffic, ICDE 2002, San Jose, CA,
2/26/2002 - 3/1/2002.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 261
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 262
Outline
• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Kalman filters
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 263
Detailed Outline
• Non-linear forecasting
– Problem
– Idea
– How-to
– Experiments
– Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 264
Recall: Problem #1
Given a time series {xt}, predict its future
course, that is, xt+1, xt+2, ...
Time
Value
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 265
How to forecast?
• ARIMA - but: linearity assumption
• ANSWER: „Delayed Coordinate
Embedding‟ = Lag Plots [Sauer92]
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 266
General Intuition (Lag Plot)
xt-1
xt
4-NNNew Point
Interpolate
these…
To get the final
prediction
Lag = 1,
k = 4 NN
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 267
Questions:
• Q1: How to choose lag L?
• Q2: How to choose k (the # of NN)?
• Q3: How to interpolate?
• Q4: why should this work at all?
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 268
Q1: Choosing lag L
• Manually (16, in award winning system by
[Sauer94])
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 269
Q2: Choosing number of
neighbors k
• Manually (typically ~ 1-10)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 270
Q3: How to interpolate?
How do we interpolate between the
k nearest neighbors?
A3.1: Average
A3.2: Weighted average (weights drop
with distance - how?)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 271
Q3: How to interpolate?
A3.3: Using SVD - seems to perform best
([Sauer94] - first place in the Santa Fe
forecasting competition)
Xt-1
xt
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 272
Q4: Any theory behind it?
A4: YES!
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 273
Theoretical foundation
• Based on the “Takens‟ Theorem”
[Takens81]
• which says that long enough delay vectors
can do prediction, even if there are
unobserved variables in the dynamical
system (= diff. equations)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 274
Theoretical foundation
Example: Lotka-Volterra equations
dH/dt = r H – a H*P
dP/dt = b H*P – m P
H is count of prey (e.g., hare)
P is count of predators (e.g., lynx)
Suppose only P(t) is observed (t=1, 2, …).
H
P
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 275
Theoretical foundation
• But the delay vector space is a faithful
reconstruction of the internal system state
• So prediction in delay vector space is as
good as prediction in state space
Skip
H
P
P(t-1)
P(t)
CMU SCS
Solution to Volterra-Lotka eq.
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 276from wikipedia
time# prey
# predators
prey
predators
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 277
Detailed Outline
• Non-linear forecasting
– Problem
– Idea
– How-to
– Experiments
– Conclusions
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 278
Datasets
Logistic Parabola:xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976]
time
x(t
)
Lag-plot
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 279
Datasets
Logistic Parabola:xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976]
time
x(t
)
Lag-plot
ARIMA: fails
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 280
Logistic Parabola
Timesteps
Value
Our Prediction from
here
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 281
Logistic Parabola
Timesteps
Value
Comparison of prediction
to correct values
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 282
Datasets
LORENZ: Models convection currents in the air
dx / dt = a (y - x)
dy / dt = x (b - z) - y
dz / dt = xy - c z
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 283
LORENZ
Timesteps
Value
Comparison of prediction
to correct values
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 284
Datasets
Time
Value
• LASER: fluctuations in a Laser over time (used in Santa Fe competition)
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 285
Laser
Timesteps
Value
Comparison of prediction
to correct values
Skip
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 286
Conclusions
• Lag plots for non-linear forecasting
(Takens‟ theorem)
• suitable for „chaotic‟ signals
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 287
References
• Deepay Chakrabarti and Christos Faloutsos F4: Large-
Scale Automated Forecasting using Fractals CIKM 2002,
Washington DC, Nov. 2002.
• Sauer, T. (1994). Time series prediction using delay
coordinate embedding. (in book by Weigend and
Gershenfeld, below) Addison-Wesley.
• Takens, F. (1981). Detecting strange attractors in fluid
turbulence. Dynamical Systems and Turbulence. Berlin:
Springer-Verlag.
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 288
References
• Weigend, A. S. and N. A. Gerschenfeld (1994). Time
Series Prediction: Forecasting the Future and
Understanding the Past, Addison Wesley. (Excellent
collection of papers on chaotic/non-linear forecasting,
describing the algorithms behind the winners of the Santa
Fe competition.)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 289
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 290
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 291
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
• Linear Forecasting: AR (Box-Jenkins)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 292
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
• Linear Forecasting: AR (Box-Jenkins)
• Kalman filters & extensions: forecasting,
pattern discovery, segmentation
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 293
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
• Linear Forecasting: AR (Box-Jenkins)
• Kalman filters & extensions: forecasting,
pattern discovery, segmentation
• Bursty traffic: multifractals (80-20 „law‟)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 294
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
• Linear Forecasting: AR (Box-Jenkins)
• Kalman filters & extensions: forecasting,
pattern discovery, segmentation
• Bursty traffic: multifractals (80-20 „law‟)
• Non-linear forecasting: lag-plots (Takens)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 295
christos AT cs.cmu.edu
www.cs.cmu.edu/~christos
leili AT cs.cmu.edu
www.cs.cmu.edu/~leili
www.cs.cmu.edu/~leili/pubs/dynammo.2.1.2.zip
THANK YOU!