Download - Indexing of Time Series by Major Minima and Maxima Eugene Fink Kevin B. Pratt Harith S. Gandhi
Indexing of Time Seriesby Major Minima and Maxima
Eugene FinkKevin B. Pratt
Harith S. Gandhi
Time series
A time series is a sequence of real values measured at equal intervals.
Example:0, 3, 1, 2, 0, 1, 1, 3, 0, 2, 1, 4, 0, 1, 0
01
32
4
Results
• Compression of a time series by extracting its major minima and maxima
• Indexing of compressed time series
• Retrieval of series similar to a given pattern
• Experiments with stock and weather series
Outline
• Compression
• Indexing
• Retrieval
• Experiments
CompressionWe select major minima and maxima, along with the start point and end point, and discard the other points.
We use a positive parameter R to control the compression rate.
Major minima
A point a[m] in a[1..n] is a major minimum if there are i and j, where i < m < j, such that:• a[m] is a minimum among a[i..j], and• a[i] – a[m] R and a[j] – a[m] R.
a[j]a[i]
a[m]
R R
Major maxima
A point a[m] in a[1..n] is a major maximum if there are i and j, where i < m < j, such that:• a[m] is a maximum among a[i..j], and• a[m] – a[i] R and a[m] – a[j] R.
a[j]a[i]
a[m]
R R
Compression procedureThe procedure performs onepass through a given series.
It can compress a live serieswithout storing it in memory.
It takes linear time and constant memory.
Outline
• Compression
• Indexing
• Retrieval
• Experiments
Indexing of series
We index series in a database by their major inclines, which are upward and downward segments of the series.
Major inclinesA segment a[1..j] is a major upward incline if • a[i] is a major minimum;• a[j] is a major maximum;• for every m [i..j], a[i] < a[m] < a[j].
a[i]
a[j]
The definition of a major downward inclineis symmetric.
Identification of inclines
The procedure performs two passes through a list of major minima and maxima.
Identification of inclines
The procedure performs two passes through a list of major minima and maxima.
Its time is linear in the number of inclines.
Indexing of inclinesWe index major inclines of series in a database by their lengths and heights.
We use a range tree, which supports indexing of points by two coordinates.
lengthheight
length
height
incline
Outline
• Compression
• Indexing
• Retrieval
• Experiments
RetrievalThe procedure inputs a pattern series andsearches for similar segments in a database.
Pattern
Example:
Database
1
32
RetrievalThe procedure inputs a pattern series andsearches for similar segments in a database.
Main steps:
• Find the pattern’s inclines with the greatest height
• Retrieve all segments that have similar inclines
• Compare each of these segments with the pattern
Highest inclinesFirst, the retrieval procedure identifies the important inclines in the pattern. , and selects the highest inclines.
length1
height
length2
1 2
Candidate segmentsSecond, the procedure retrieves segments with similar inclines from the database.
An incline is considered similar if• its height is between height / C and height · C;• its length is between length / D and length · D.
We use the range tree toretrieve similar inclines.
incline
length / C
length · C
height / C
height · C
Similarity testThird, the procedure compares the retrieved segments with the pattern. ,using a given similarity test.
Outline
• Compression
• Indexing
• Retrieval
• Experiments
Experiments
We have tested a Visual-Basic implemen-tation on a 2.4-GHz Pentium computer.
Data sets:
• Stock prices: 98 series, 60,000 points
• Air and sea temperatures: 136 series, 450,000 points
00
210
fast rankingC = D = 5
time: 0.05 sec
200
perf
ect r
anki
ngStock prices (60,000 points) Search for 100-point patternsThe x-axes show the ranks of matches retrieved by the developed procedure, and the y-axes are the ranks assigned by a slow exhaustive search.
00
331
fast rankingC = D = 2
time: 0.02 sec
200
perf
ect r
anki
ng
00
400
fast rankingC = D = 1.5
time: 0.01 sec
151
perf
ect r
anki
ng
Stock prices (60,000 points) Search for 500-point patternsThe x-axes show the ranks of matches retrieved by the developed procedure, and the y-axes are the ranks assigned by a slow exhaustive search.
00
202
fast rankingC = D = 5
time: 0.31 sec
200
perf
ect r
anki
ng
00
328
fast rankingC = D = 2
time: 0.12 sec
200
perf
ect r
anki
ng
00
400
fast rankingC = D = 1.5
time: 0.09 sec
167
perf
ect r
anki
ng
Temperatures (450,000 points) Search for 200-point patternsThe x-axes show the ranks of matches retrieved by the developed procedure, and the y-axes are the ranks assigned by a slow exhaustive search.
00
202
fast rankingC = D = 5
time: 1.18 sec
200
perf
ect r
anki
ng
00
400
fast rankingC = D = 2
time: 0.27 sec
151
perf
ect r
anki
ng
00
400
fast rankingC = D = 1.5
time: 0.14 sec
82
perf
ect r
anki
ng
Conclusions
Main results: Compression and indexing of time series by major minima and maxima.
Current work: Hierarchical indexing by importance levels of minima and maxima.
4
3 3
3 3
1
1 1
11
1