indexing of time series by major minima and maxima eugene fink kevin b. pratt harith s. gandhi
Post on 20-Dec-2015
217 views
TRANSCRIPT
Indexing of Time Seriesby Major Minima and Maxima
Eugene FinkKevin B. Pratt
Harith S. Gandhi
Time series
A time series is a sequence of real values measured at equal intervals.
Example:0, 3, 1, 2, 0, 1, 1, 3, 0, 2, 1, 4, 0, 1, 0
01
32
4
Results
• Compression of a time series by extracting its major minima and maxima
• Indexing of compressed time series
• Retrieval of series similar to a given pattern
• Experiments with stock and weather series
Outline
• Compression
• Indexing
• Retrieval
• Experiments
CompressionWe select major minima and maxima, along with the start point and end point, and discard the other points.
We use a positive parameter R to control the compression rate.
Major minima
A point a[m] in a[1..n] is a major minimum if there are i and j, where i < m < j, such that:• a[m] is a minimum among a[i..j], and• a[i] – a[m] R and a[j] – a[m] R.
a[j]a[i]
a[m]
R R
Major maxima
A point a[m] in a[1..n] is a major maximum if there are i and j, where i < m < j, such that:• a[m] is a maximum among a[i..j], and• a[m] – a[i] R and a[m] – a[j] R.
a[j]a[i]
a[m]
R R
Compression procedureThe procedure performs onepass through a given series.
It can compress a live serieswithout storing it in memory.
It takes linear time and constant memory.
Outline
• Compression
• Indexing
• Retrieval
• Experiments
Indexing of series
We index series in a database by their major inclines, which are upward and downward segments of the series.
Major inclinesA segment a[1..j] is a major upward incline if • a[i] is a major minimum;• a[j] is a major maximum;• for every m [i..j], a[i] < a[m] < a[j].
a[i]
a[j]
The definition of a major downward inclineis symmetric.
Identification of inclines
The procedure performs two passes through a list of major minima and maxima.
Identification of inclines
The procedure performs two passes through a list of major minima and maxima.
Its time is linear in the number of inclines.
Indexing of inclinesWe index major inclines of series in a database by their lengths and heights.
We use a range tree, which supports indexing of points by two coordinates.
lengthheight
length
height
incline
Outline
• Compression
• Indexing
• Retrieval
• Experiments
RetrievalThe procedure inputs a pattern series andsearches for similar segments in a database.
Pattern
Example:
Database
1
32
RetrievalThe procedure inputs a pattern series andsearches for similar segments in a database.
Main steps:
• Find the pattern’s inclines with the greatest height
• Retrieve all segments that have similar inclines
• Compare each of these segments with the pattern
Highest inclinesFirst, the retrieval procedure identifies the important inclines in the pattern. , and selects the highest inclines.
length1
height
length2
1 2
Candidate segmentsSecond, the procedure retrieves segments with similar inclines from the database.
An incline is considered similar if• its height is between height / C and height · C;• its length is between length / D and length · D.
We use the range tree toretrieve similar inclines.
incline
length / C
length · C
height / C
height · C
Similarity testThird, the procedure compares the retrieved segments with the pattern. ,using a given similarity test.
Outline
• Compression
• Indexing
• Retrieval
• Experiments
Experiments
We have tested a Visual-Basic implemen-tation on a 2.4-GHz Pentium computer.
Data sets:
• Stock prices: 98 series, 60,000 points
• Air and sea temperatures: 136 series, 450,000 points
00
210
fast rankingC = D = 5
time: 0.05 sec
200
perf
ect r
anki
ngStock prices (60,000 points) Search for 100-point patternsThe x-axes show the ranks of matches retrieved by the developed procedure, and the y-axes are the ranks assigned by a slow exhaustive search.
00
331
fast rankingC = D = 2
time: 0.02 sec
200
perf
ect r
anki
ng
00
400
fast rankingC = D = 1.5
time: 0.01 sec
151
perf
ect r
anki
ng
Stock prices (60,000 points) Search for 500-point patternsThe x-axes show the ranks of matches retrieved by the developed procedure, and the y-axes are the ranks assigned by a slow exhaustive search.
00
202
fast rankingC = D = 5
time: 0.31 sec
200
perf
ect r
anki
ng
00
328
fast rankingC = D = 2
time: 0.12 sec
200
perf
ect r
anki
ng
00
400
fast rankingC = D = 1.5
time: 0.09 sec
167
perf
ect r
anki
ng
Temperatures (450,000 points) Search for 200-point patternsThe x-axes show the ranks of matches retrieved by the developed procedure, and the y-axes are the ranks assigned by a slow exhaustive search.
00
202
fast rankingC = D = 5
time: 1.18 sec
200
perf
ect r
anki
ng
00
400
fast rankingC = D = 2
time: 0.27 sec
151
perf
ect r
anki
ng
00
400
fast rankingC = D = 1.5
time: 0.14 sec
82
perf
ect r
anki
ng
Conclusions
Main results: Compression and indexing of time series by major minima and maxima.
Current work: Hierarchical indexing by importance levels of minima and maxima.
4
3 3
3 3
1
1 1
11
1