![Page 1: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/1.jpg)
Visualizing and Discovering Nontrivial Patterns In Large
Time Series DatabasesJessica Lin, Eamonn Keogh, Stefano LoardiInformation Visualization 2005, Vol 4, No 2.
Presented By
Nicholas Chen Samah Ramadan
![Page 2: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/2.jpg)
Time Series• What?
– sequences of values or events changing with time
• Why?– Applications
• Medicine: ECG, EEG
• Finance: stock market, credit cards
• Aerospace: launch telemetry, satellite sensor
• Entertainment: music, movies
![Page 3: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/3.jpg)
Data mining Time series
• Why?– Trend Analysis– Similarity Search
• What tasks?1. Sequence matching: whole / subsequence / chunking
2. Anomaly detection : deviation from normal
3. Motif discovery: overrepresentation
![Page 4: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/4.jpg)
VizTree
• What?– Visualization tool for time series data – Based on subsequence trees
• How?– Time series Symbolic representation– Symbolic representation Tree representation
VizTree
VizTree
![Page 5: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/5.jpg)
Previous Approaches
• Cluster and calendar based visualization– Time series Sequence of day patterns– By bottom-up clustering algorithm– Limitations: calendar pattern data, prior knowledge of
patterns
![Page 6: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/6.jpg)
Previous Approaches (cont..)
• Spiral– Periodic section of time one ring – Data values color and line thickness– Limitations: Data should be periodic (known period)
![Page 7: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/7.jpg)
Previous Approaches (cont..)• TimeSearcher
– Query-by-content– Flexible– User must specify query regions (must know what to
look for)– Scalability issues
![Page 8: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/8.jpg)
VizTree Example
• An interesting problem– Two sets of binary sequences of length 200 were
generated– One sequence generated by a pseudo-random-
number generator by the computer– The other was generated by hand by a group of
volunteers
![Page 9: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/9.jpg)
VizTree Example
• Can you tell who generated which?• VizTree can!
– Subsequence tree representations for all sets of 3 digits in each sequence.
Real Random! Fake random!
![Page 10: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/10.jpg)
Discretizing Time Series
Problem: Most time series are not discrete• Must convert real-valued data to symbols• Symbolic Aggregate approXimation, SAX
– Lower bound symbolic space– Feasible approximation for large databases
• Normalization before discretization (usually)
![Page 11: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/11.jpg)
SAX
– A subsequence C is extracted by a sliding window of length n– Each window is divided into w equal-sized regions– Average the data points in each region– The average will fall into one of α levels (alphabet size)– A symbol corresponds to each level (a, b, c, d … )
– Example: Above window corresponds to a c d c b d b a
![Page 12: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/12.jpg)
A Sample Tree
w regions
α x w leaves (independent of time series length)
α possible levels
![Page 13: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/13.jpg)
VizTree in Action
• Subsequence matching– Would like to find patterns that have certain
characteristics– Do this by specifying a path of nodes through the big
tree
• Demo– Find certain patterns in heartbeat pattern– Notice interactive detail view
![Page 14: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/14.jpg)
VizTree In Action II
• Finding Motifs– VizTree is very good at showing commonly occurring
motifs– Simply look at thick branches
• Demo– Look at what most weeks look like in power
consumption– Can step through, or go directly to a motif
![Page 15: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/15.jpg)
VizTree In Action III
• Finding simple anomalies– One way is to do the opposite of finding motifs– Go through all the thin lines
• Demo– Locate weeks where power consumption is unusual– Also, locate where heart beat is irregular
![Page 16: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/16.jpg)
VizTree In Action IV
• Diff-Tree– For more complex anomalies– Compare a times series against a reference time
series– Three concepts
• Difference in frequencies (Blue or Green)• Confidence (Luminosity)• Difference x Confidence = Surprisingness (Red)
• Demo– Two similar data sets, find areas where they differ– VizTree can rank surprisingness
![Page 17: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/17.jpg)
Numerosity Reduction
• Fancy term for reducing the noise by removing trivial patterns
• Consecutive windows are often similar or identical.• Results in overcounting, can obscure differences
All 3 windows will be “medium – low - high”
![Page 18: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/18.jpg)
VizTree Criticisms
• While exploring VizTree to prepare demos, noticed a couple of issues:– Atrocious time series UI– Parameter values somewhat of a black art– Phase issues– Hierarchical tree structure is somewhat misleading
![Page 19: Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005,](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649d4c5503460f94a29e21/html5/thumbnails/19.jpg)
VizTree• Advantages:
– Scalable to large data sets
– Good for tasks it is designed for (finding motifs, anomaly detection, high level sequence search)
• Disadvantages:– Not so good for other data mining tasks
– Not completely intuitive – need to think in terms of the program
– Settings are arbitrary and dataset dependant