fast time series classification using numerosity reduction
DESCRIPTION
Fast Time Series Classification Using Numerosity Reduction. Xiaopeng Xi Eamonn Keogh Christian Shelton Li Wei Chotirat Ann Ratanamahatana [email protected] Department of Computer Science and Engineering University of California, Riverside June 28, 2006. Overview. Background Motivation - PowerPoint PPT PresentationTRANSCRIPT
ICML 2006, CMU, Pittsburgh 1
Fast Time Series Classification Using Numerosity Reduction
Xiaopeng Xi Eamonn Keogh Christian Shelton Li Wei Chotirat Ann Ratanamahatana
Department of Computer Science and Engineering
University of California, RiversideJune 28, 2006
2
Overview
Background Motivation Naïve rank reduction Adaptive warping window in DTW Experimental results Conclusions
3
Time Series
Flat-tailed Horned Lizard
0 200 400 600 800 1000 1200 14000 200 400 600 800 1000 1200 14000 200 400 600 800 1000 1200 1400
ECG Heartbeat
Stock
0 200 400 600 800 1000 1200 1400
A B C
0 200 400 600 800 1000 1200 14000 200 400 600 800 1000 1200 1400
4
Time Series Classification
Applications Insect species, heart beat, etc
1-Nearest Neighbor Classifier Distance measures: Euclidean, LCSS,
DTW…
5
Dynamic Time Warping
K
iiwCQDTW
1
min),(
C
Q
C
Q
Warping path w Sakoe-Chiba Band
warping window r
Euclidean
Dynamic Time Warping
6
Flat-tailed Horned Lizard
Texas Horned Lizard
Alignment byDynamic Time Warping
7
Observation I 1-Nearest Neighbor with DTW distance is
hard to beat in time series classification
0
1
2
3
4
5
6
7
8
1-N
N D
TW
Firs
t ord
er lo
gic r
ule
Perc
eptro
n ne
ural
net
wor
k
Supe
r-ker
nal f
usio
n sc
hem
eM
ultip
le c
lass
ifier
Mul
ti-sc
ale h
istog
ram
erro
r ra
te (
%)
Control ChartTwo Patterns Control Chart FORTE-2
1NN-DTW (0.0) 1NN-DTW (0.33) 1NN-DTW (9.09)
Decision Tree (4.9)
First Order logic rule (3.6)
Multi layer perceptron neural
network (1.9)
Super-kernel fusion scheme (0.79)
Multiple classifier (7.2)
Multi-scale histogram (6.0)
Grammar-guided feature
extraction (13.22)
Comparison of classification error rate (%)
9
Observation I
1-NN DTW achieves high accuracy but slow 1-NN needs O(N 2), N = dataset size DTW is computationally expensive
Can we speed 1-NN DTW up?
10
Observation II As the data size decreases, larger
warping window achieves higher accuracy- Numerosity Reduction
The accuracy peaks very early with small warping window- Accelerate DTW Computation
Band width d
Q
C
0 10 20 30 40 50 60 70 80 90 10060
65
70
75
80
85
90
95
100
Acc
ura
cy(%
)
6 instances
100 instances
50 instances
24 instances
12 instances
0 10 20 30 40 50 60 70 80 90 10060
65
70
75
80
85
90
95
100
Warping Window r(%)
6 instances
100 instances
50 instances
24 instances
12 instances
6 instances
100 instances
50 instances
24 instances
12 instances
Gun-Point
Dataset DTW(%) WarpingWindow
(%)
Euclidean(%)
Gun-Point 99.00 3 94.50
Trace 100.00 3 89.00
Two Patterns 100.00 3 98.96
CBF 100.00 1 97.67
Control Chart 99.67 8 92.50
Face 96.43 3 94.64
Leaf 96.15 8 66.74
Ecg 90.00 1 90.00
Pulse 99.00 1 99.00
Lighting(FORTE-2)
90.91 5 75.21
Wafer 99.93 1 99.90
Word Spotting 80 3 70.06
HapticX 65 15 60
DTW gives better accuracy than Euclidean distance, peaks very early with small warping window
13
Speed Up 1-NN DTW
Numerosity reduction (data editing) Heuristic order searching, prune worst
examplar first – Naive Rank Dynamically adjust warping
window
Band width d
Q
C
14
Naïve Rank Reduction Assign rank to each instance Prune the one with lowest rank Reassign rank for other instances Repeat above steps until stop criterion
15
Naïve Rank Reduction
Rank( P ) = 1 + 1 + 1 - 2 = 1
+
+
+ +
-P
j
jxxxrank
otherwise2
asclasssamethehasif1)(
j jxxd
xweight2),(
1)(break ties by
16
Adaptive Warping Window
Basic ideas: Adjust warping window dynamically
during the numerosity reduction Prune instances one at a time,
increase the warping band by 1% if necessary
Band width d
Q
C
0 10 20 30 40 50 60 70 80 90 100
6 instances
100 instances
50 instances
24 instances
12 instances
0 10 20 30 40 50 60 70 80 90 10060
65
70
75
80
85
90
95
100
6 instances
100 instances
50 instances
24 instances
12 instances
1 …..... 15
1000
999
998
997
996
995
:
:
:
warping window searching (%)
data
99% 99%
98% 98%
97% 98%
97% 96%
95% 96%
94% 96%
97% 99% 99% 92%
22 3 4 5
data
pr
unin
g
18
Speeding-up DTW classification
Solution: LB_Keogh lower bounding, amortized cost O
(n) Store DTW distance matrix and nearest neigh
bor matrix, update dynamically Compute accuracy by looking up matrices 4 or 5 orders of magnitude faster
19
Experimental Results
10000 100 200 300 400 500 600 700 800 90030
40
50
60
70
90
100
data instances
accu
racy
(%)
Random, Euclidean
Random, Fixed
NaiveRank , Fixed
NaiveRank , Adaptive
4%5%
11%
14% 9%
0
80
1-NN Euclidean1-NN DTW fix window1-NN DTW adaptive
4%
6%7%8%
10%
12%13%
Two Patterns, on training set
EuclideanDTW
20
Experiments Results
Swedish Leaf, on test setTwo Patterns, on test set
RT algorithms are introduced in Wilson, D.R. & Martinez, T.R. (1997). Instance Pruning Techniques. ICML’97
0 100 200 300 400 500 600 700 800 900 100030
40
50
60
70
80
90
100
data instances
accu
racy
(%)
5%
11%
14% 9% 4%
6%7%
8%
10%
12%13%
0 50 100 150 200 250 300 350 400 450 50020
30
40
50
60
70
80
90
data instances
accu
racy
(%)
3%4%
5%
6%
RT1RT2RT3
1-NN DTW
1-NN Euclidean
RT1RT2RT31-NN DTW
1-NN Euclidean
21
Conclusions 1-NN DTW is very competitive in time
series classification We show novel observations of
relationship between warping window size and dataset size
We produce an extremely fast accurate classifier
22
Thank You!
Fast DTW
LB_Keogh
Brute force
DTW computatio
n(during
process)
2x103 3x108 7x108
Process Time (sec)
1x101 1x106 3x106
fast DTW
LB_Keogh
brute force
x105
fast DTW
LB_Keogh
brute force
x108
Two Patterns, 1,000 training / 4,000 test