fast time series classification using numerosity reduction

ICML 2006, CMU, Pittsburgh 1

Fast Time Series Classification Using Numerosity Reduction

Xiaopeng Xi Eamonn Keogh Christian Shelton Li Wei Chotirat Ann Ratanamahatana

[email protected]

Department of Computer Science and Engineering

University of California, RiversideJune 28, 2006

2

Overview

Background Motivation Naïve rank reduction Adaptive warping window in DTW Experimental results Conclusions

3

Time Series

Flat-tailed Horned Lizard

0 200 400 600 800 1000 1200 14000 200 400 600 800 1000 1200 14000 200 400 600 800 1000 1200 1400

ECG Heartbeat

Stock

0 200 400 600 800 1000 1200 1400

A B C

0 200 400 600 800 1000 1200 14000 200 400 600 800 1000 1200 1400

4

Time Series Classification

Applications Insect species, heart beat, etc

1-Nearest Neighbor Classifier Distance measures: Euclidean, LCSS,

DTW…

5

Dynamic Time Warping

K

iiwCQDTW

1

min),(

C

Q

C

Q

Warping path w Sakoe-Chiba Band

warping window r

Euclidean

Dynamic Time Warping

6

Flat-tailed Horned Lizard

Texas Horned Lizard

Alignment byDynamic Time Warping

7

Observation I 1-Nearest Neighbor with DTW distance is

hard to beat in time series classification

0

1

2

3

4

5

6

7

8

1-N

N D

TW

Firs

t ord

er lo

gic r

ule

Perc

eptro

n ne

ural

net

wor

k

Supe

r-ker

nal f

usio

n sc

hem

eM

ultip

le c

lass

ifier

Mul

ti-sc

ale h

istog

ram

erro

r ra

te (

%)

Control ChartTwo Patterns Control Chart FORTE-2

1NN-DTW (0.0) 1NN-DTW (0.33) 1NN-DTW (9.09)

Decision Tree (4.9)

First Order logic rule (3.6)

Multi layer perceptron neural

network (1.9)

Super-kernel fusion scheme (0.79)

Multiple classifier (7.2)

Multi-scale histogram (6.0)

Grammar-guided feature

extraction (13.22)

Comparison of classification error rate (%)

9

Observation I

1-NN DTW achieves high accuracy but slow 1-NN needs O(N 2), N = dataset size DTW is computationally expensive

Can we speed 1-NN DTW up?

10

Observation II As the data size decreases, larger

warping window achieves higher accuracy- Numerosity Reduction

The accuracy peaks very early with small warping window- Accelerate DTW Computation

Band width d

Q

C

0 10 20 30 40 50 60 70 80 90 10060

65

70

75

80

85

90

95

100

Acc

ura

cy(%

)

6 instances

100 instances

50 instances

24 instances

12 instances

0 10 20 30 40 50 60 70 80 90 10060

65

70

75

80

85

90

95

100

Warping Window r(%)

6 instances

100 instances

50 instances

24 instances

12 instances

6 instances

100 instances

50 instances

24 instances

12 instances

Gun-Point

Dataset DTW(%) WarpingWindow

(%)

Euclidean(%)

Gun-Point 99.00 3 94.50

Trace 100.00 3 89.00

Two Patterns 100.00 3 98.96

CBF 100.00 1 97.67

Control Chart 99.67 8 92.50

Face 96.43 3 94.64

Leaf 96.15 8 66.74

Ecg 90.00 1 90.00

Pulse 99.00 1 99.00

Lighting(FORTE-2)

90.91 5 75.21

Wafer 99.93 1 99.90

Word Spotting 80 3 70.06

HapticX 65 15 60

DTW gives better accuracy than Euclidean distance, peaks very early with small warping window

13

Speed Up 1-NN DTW

Numerosity reduction (data editing) Heuristic order searching, prune worst

examplar first – Naive Rank Dynamically adjust warping

window

Band width d

Q

C

14

Naïve Rank Reduction Assign rank to each instance Prune the one with lowest rank Reassign rank for other instances Repeat above steps until stop criterion

15

Naïve Rank Reduction

Rank( P ) = 1 + 1 + 1 - 2 = 1

+

+

+ +

-P

j

jxxxrank

otherwise2

asclasssamethehasif1)(

j jxxd

xweight2),(

1)(break ties by

16

Adaptive Warping Window

Basic ideas: Adjust warping window dynamically

during the numerosity reduction Prune instances one at a time,

increase the warping band by 1% if necessary

Band width d

Q

C

0 10 20 30 40 50 60 70 80 90 100

6 instances

100 instances

50 instances

24 instances

12 instances

0 10 20 30 40 50 60 70 80 90 10060

65

70

75

80

85

90

95

100

6 instances

100 instances

50 instances

24 instances

12 instances

1 …..... 15

1000

999

998

997

996

995

:

:

:

warping window searching (%)

data

99% 99%

98% 98%

97% 98%

97% 96%

95% 96%

94% 96%

97% 99% 99% 92%

22 3 4 5

data

pr

unin

g

18

Speeding-up DTW classification

Solution: LB_Keogh lower bounding, amortized cost O

(n) Store DTW distance matrix and nearest neigh

bor matrix, update dynamically Compute accuracy by looking up matrices 4 or 5 orders of magnitude faster

19

Experimental Results

10000 100 200 300 400 500 600 700 800 90030

40

50

60

70

90

100

data instances

accu

racy

(%)

Random, Euclidean

Random, Fixed

NaiveRank , Fixed

NaiveRank , Adaptive

4%5%

11%

14% 9%

0

80

1-NN Euclidean1-NN DTW fix window1-NN DTW adaptive

4%

6%7%8%

10%

12%13%

Two Patterns, on training set

EuclideanDTW

20

Experiments Results

Swedish Leaf, on test setTwo Patterns, on test set

RT algorithms are introduced in Wilson, D.R. & Martinez, T.R. (1997). Instance Pruning Techniques. ICML’97

0 100 200 300 400 500 600 700 800 900 100030

40

50

60

70

80

90

100

data instances

accu

racy

(%)

5%

11%

14% 9% 4%

6%7%

8%

10%

12%13%

0 50 100 150 200 250 300 350 400 450 50020

30

40

50

60

70

80

90

data instances

accu

racy

(%)

3%4%

5%

6%

RT1RT2RT3

1-NN DTW

1-NN Euclidean

RT1RT2RT31-NN DTW

1-NN Euclidean

21

Conclusions 1-NN DTW is very competitive in time

series classification We show novel observations of

relationship between warping window size and dataset size

We produce an extremely fast accurate classifier

22

Thank You!

Fast DTW

LB_Keogh

Brute force

DTW computatio

n(during

process)

2x103 3x108 7x108

Process Time (sec)

1x101 1x106 3x106

fast DTW

LB_Keogh

brute force

x105

fast DTW

LB_Keogh

brute force

x108

Two Patterns, 1,000 training / 4,000 test

fast time series classification using numerosity reduction

Documents