arxiv:1909.05738v3 [cs.lg] 7 oct 2019 · are grouped based on their core transformation technique....

22
A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency Anthony Bagnall 1 , Franz J. Kir´ aly 2 , Markus L¨ oning 2 , Matthew Middlehurst 1 , and George Oastler 1 1 University of East Anglia 2 University College London & The Alan Turing Institute Abstract. sktime is an open source, Python based, sklearn compatible toolkit for time series analysis developed by researchers at the University of East Anglia (UEA), University College London and the Alan Turing Institute. A key initial goal for sktime was to provide time series classi- fication functionality equivalent to that available in a related java pack- age, tsml, also developed at UEA. We describe the implementation of six such classifiers in sktime and compare them to their tsml equivalents. We demonstrate correctness through equivalence of accuracy on a range of standard test problems and compare the build time of the different im- plementations. We find that there is significant difference in accuracy on only one of the six algorithms we look at (Proximity Forest [19]). This difference is causing us some pain in debugging. We found a much wider range of difference in efficiency. Again, this was not unexpected, but it does highlight ways both toolkits could be improved. 1 Introduction sktime 3 is an open source, Python based, sklearn compatible toolkit for time series analysis developed by researchers at the University of East Anglia, Uni- versity College London and the Alan Turing Institute. sktime is designed to provide a unifying API for a range of time series tasks such as annotation, pre- diction and forecasting (see [18] for a description of the overarching design of sktime). A key prediction component is algorithms for time series classification (TSC). We have implemented a broad range of TSC algorithms within sktime. This technical report provides evidence of the correctness of the implementation and assesses the time efficiency of these classifiers by comparing accuracy and run time to the Java versions within the tsml toolkit 4 used in an extensive ex- perimental comparison published in 2017 [3]. This is a work in progress, and this document will be updated as we improve and add functionality. 3 https://github.com//alan-turing-institute/sktime 4 https://github.com/uea-machine-learning/tsml arXiv:1909.05738v3 [cs.LG] 7 Oct 2019

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

A tale of two toolkits, report the first:benchmarking time series classification

algorithms for correctness and efficiency

Anthony Bagnall1, Franz J. Kiraly2, Markus Loning2, Matthew Middlehurst1,and George Oastler1

1 University of East Anglia2 University College London & The Alan Turing Institute

Abstract. sktime is an open source, Python based, sklearn compatibletoolkit for time series analysis developed by researchers at the Universityof East Anglia (UEA), University College London and the Alan TuringInstitute. A key initial goal for sktime was to provide time series classi-fication functionality equivalent to that available in a related java pack-age, tsml, also developed at UEA. We describe the implementation of sixsuch classifiers in sktime and compare them to their tsml equivalents. Wedemonstrate correctness through equivalence of accuracy on a range ofstandard test problems and compare the build time of the different im-plementations. We find that there is significant difference in accuracy ononly one of the six algorithms we look at (Proximity Forest [19]). Thisdifference is causing us some pain in debugging. We found a much widerrange of difference in efficiency. Again, this was not unexpected, but itdoes highlight ways both toolkits could be improved.

1 Introduction

sktime3 is an open source, Python based, sklearn compatible toolkit for timeseries analysis developed by researchers at the University of East Anglia, Uni-versity College London and the Alan Turing Institute. sktime is designed toprovide a unifying API for a range of time series tasks such as annotation, pre-diction and forecasting (see [18] for a description of the overarching design ofsktime). A key prediction component is algorithms for time series classification(TSC). We have implemented a broad range of TSC algorithms within sktime.This technical report provides evidence of the correctness of the implementationand assesses the time efficiency of these classifiers by comparing accuracy andrun time to the Java versions within the tsml toolkit4 used in an extensive ex-perimental comparison published in 2017 [3]. This is a work in progress, and thisdocument will be updated as we improve and add functionality.

3 https://github.com//alan-turing-institute/sktime4 https://github.com/uea-machine-learning/tsml

arX

iv:1

909.

0573

8v3

[cs

.LG

] 7

Oct

201

9

Page 2: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

2 A. Bagnall et al.

2 sktime structure

sktime is organised into high level packages to reflect the different tasks, shownin Figure 2. We are concerned with the packages classifiers, the componentsof which make use of distance measures and transformers. Note that in thedev branch there is a package contrib. This contains implementations that havenot been fully integrated yet and bespoke code to run experiments. Classifiers

Fig. 1. Overview of the current sktime file structure (sktime version 0.3).

are grouped based on their core transformation technique. The taxonomy ofclassifiers is taken from the comparative study presented in [3]. Currently, thefollowing classifiers are available.

interval based is a package for classifiers built on intervals of the series.Currently, it contains a single classifier the Time Series Forest [8], in file tsf.py.This is evaluated in Section 3. Spectral based classifiers are located in the pack-age frequency based. The random interval spectral ensemble (RISE) [17] is asyet the only concrete implementation of this type and it can be found in rise.py

(see Section 4). dictionary based contains classifiers that form histograms ofdiscretised words. We have the Bag of SFA Symbols (BOSS) [23] classifier, imple-mented as BOSSEnsemble in boss.py, with experiments described in Section 5.Shapelets are discriminatory subseries, and classifiers that use them are in theshapelet based package. We evaluate a version of the shapelet transform clas-sifier [4] implemented in the file stc.py.

Page 3: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 3

Fig. 2. Overview of the current sktime classifier packages (sktime version 0.3).

In time series neigbours.py there is KNeighborsTimeSeriesClassifier, ansklearn based nearest neighbour (NN) classifier that can be configured to usea range of distance measures, including: dynamic time warping (DTW); deriva-tive dynamic time warping (DDTW) [10]; weighted DTW [13]; longest commonsubsequence; edit distance with penalty (ERP); and move-split-merge [26]. Un-til recently, DTW with 1-NN was by far the most commonly used benchmarkalgorithm for time series classification. These distance functions are availablein the package distances. we evaluate the ElasticEnsemble [16], and ensem-ble of nearest neighbour classifiers using elastic distance measures, which is inelastic ensemble.py. Furthermore, in proximity.py there is theProximityForest [19], a tree based classifier using a range of distance mea-sures. DTW, EE and PF classifiers are evaluated in Section 7. The sktime ex-periments described in the following sections can be reproduced using the codein contrib/basic benchmarking.py.

3 Interval Based: Time Series Forest (TSF) [8]

TSF5 is probably the simplest bespoke classifier and hence was the first algorithmwe implemented. TSF is an ensemble of tree classifiers, each constructed on adifferent feature space. For any given ensemble member, a set of random intervalsis selected, and three summary statistics (mean, standard deviation and slope)

5 http://www.timeseriesclassification.com/algorithmdescription.php?algorithm id=8

Page 4: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

4 A. Bagnall et al.

are calculated for each interval. The statistics for each interval are concatenatedto form the new feature space.

In sktime, there are two ways of constructing a TSF estimator. The hardcoded implementation TimeSeriesForest is in the file tsf.py. An exampleusage, including data loading and building, assuming full path is where theproblems reside6, is as follows:

from sktime.utils.load_data import load_from_tsfile_to_dataframe

as load_ts

import sktime.classifiers.interval_based.tsf as ib

#method 1

tsf=ib.TimeSeriesForest(n_trees=100)

trainX, trainY = load_ts(full_path + ’_TRAIN.ts’)

testX, testY = load_ts(full_path + ’_TEST.ts’)

tsf.fit(trainX, trainY)

predictions=tsf.predict(testX)

probabilities=tsf.predict_proba(testX)

Alternatively, you can configure the TimeSeriesForestClassifier inclassifiers.compose.ensemble.py as TSF as follows

#method 2

steps = [

(’segment’, RandomIntervalSegmenter(n_intervals=’sqrt’)),

(’transform’, FeatureUnion([

(’mean’, RowwiseTransformer(FunctionTransformer(func=np.mean,

validate=False))),

(’std’, RowwiseTransformer(FunctionTransformer(func=np.std,

validate=False))),

(’slope’, RowwiseTransformer(

FunctionTransformer(func=time_series_slope),

validate=False))

])),

(’clf’, DecisionTreeClassifier())

]

base_estimator = Pipeline(steps)

tsf = TimeSeriesForestClassifier(base_estimator=base_estimator,

n_estimators=100)

The validate=False is to stop the built in methods using the sktime data for-mat incorrectly (and to suppress annoying warnings). It will be set by defaultto False in the next release. The configurable version allows for the easy formu-lation of TSF like variants and other transformation based ensembles. However,

6 https://github.com/alan-turing-institute/sktime/blob/master/examples/loading data.ipynbfor examples of how to load data

Page 5: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 5

this comes at the cost of efficiency. To verify the implementations, we comparethe tsml java version7 used in the bake off [3] to the two sktime versions.

We aim to show there is no difference in accuracy between the three imple-mentations. We evaluate these three classifiers on 30 stratified resamples on 111of the UCR archive [6].

Full results are available on the associated website8. Figure ?? shows the crit-ical difference diagrams [7] for the three versions for accuracy, balanced accuracy,area under the receiver operator curve (AUC) and the negative log likelihood(NLL). Solid bars indicate ’cliques’, within which there is no significant differencebetween members. A fuller explanation is provided in [2]. Figure ?? demonstratesthere is no significant difference in between the classifiers in accuracy, balancedaccuracy and NLL. The only observed difference is that sktime1 is significantlybetter than sktime2 with AUC. This may be explainable by the voting methodemployed by the two algorithms being different, although we have not confirmedit. We suspect this is just a chance result. The actual differences in AUC arevery small.

3 2 1

1.8919 sktime21.9865 sktime1

2.1216tsml

3 2 1

1.9054 sktime22.0135 sktime1

2.0811tsml

(a) Accuracy (b) Balanced Accuracy

3 2 1

1.9099 sktime11.991 tsml

2.0991sktime2

3 2 1

1.955 sktime21.964 tsml

2.0811sktime1

(c) AUC (d) NLL

Fig. 3. Critical difference diagrams for three TSF implementations. Two from sktime(sktime1 using method 1 and sktime2 using method 2) and one from tsml. There is nosignificant difference between the three classifiers (as tested with a pairwise wilcoxontest with Holm correction, α = 0.05).

Figures 5 and 6 shows the pairwise scatter plots for sktime2 vs tsml andsktime1. Timing comparison for the three classifiers is shown in Figure 7.

The configurable version is an order of magnitude slower than the fixed ver-sion, which is approximately equivalent to the java based tsml. We are not en-tirely sure as to the reason for this slowdown with the composite. We suspectit is due to excessive unpacking and packing of data into pandas, but it mer-

7 https://github.com/uea-machine-learning/tsml/8 http://www.timeseriesclassification.com/sktime.php

Page 6: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

6 A. Bagnall et al.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sktime1 AUROC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

sktim

e2 A

UR

OC

sktime2is better here

sktime1is better here

Fig. 4. TSF sktime1 vs sktime2 scatterplot of AUC. sktime2 Wins/Draw/Losses:41/5/65

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sktime1 ACC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

sktim

e2 A

CC

sktime2is better here

sktime1is better here

Fig. 5. TSF sktime1 vs sktime2 scatterplot of test accuracies. sktime2Wins/Draw/Losses: 59/5/47.

its further investigation and detailed profiling, particularly as the accuracy ismarginally, although not significantly, higher.

Page 7: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sktime2 ACC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

tsm

l AC

C

tsmlis better here

sktime2is better here

Fig. 6. TSF tsml vs sktime2 scatterplot of test accuracies. sktime2 Wins/Draw/Losses:60/3/48.

4 Frequency Based: RISE [17]

RISE is similar to TSF in that it is an ensemble of decision tree classifiers. Thedifference lies in the feature space used for each base classifier. RISE selects asingle random interval for each base classifier, then transforms the time seriesin that interval into the power spectrum and autocorrelation features. As withTSF, there are two ways of building RISE.

Page 8: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

8 A. Bagnall et al.

ItalyP

ower

Deman

d

Freez

erSm

allTra

inW

ine

Freez

erReg

ularT

rain

DistalP

halan

xOut

lineA

geGro

up

Wor

msT

woClas

s

Small

Kitche

nApp

lianc

es

Cricke

tX

UWav

eGes

ture

Libra

ryZ

NonIn

vasiv

eFet

alECGTho

rax1

Datasets ordered by average train time, D=111

102

103

104

105

106

107

Tim

e, tr

ain

(ms)

data1data2data3

Fig. 7. Train time for three TSF classifiers: sktime1 (data1); tsml (data2); and sktime2(data3).

import sktime.classifiers.interval_based.tsf as ib

from sklearn.preprocessing import FunctionTransformer

from sklearn.tree import DecisionTreeClassifier

from statsmodels.tsa.stattools import acf

from sktime.transformers.compose import RowwiseTransformer

from sktime.transformers.segment import RandomIntervalSegmenter

from sktime.transformers.compose import ColumnTransformer

from sktime.transformers.compose import Tabulariser

from sktime.pipeline import Pipeline

from sktime.pipeline import FeatureUnion

from sktime.classifiers.compose import TimeSeriesForestClassifier

#method 1: fixed classifier

rise = fb.RandomIntervalSpectralForest(n_trees=100)

#method 2: configurable classifier

steps = [

(’segment’,RandomIntervalSegmenter(n_intervals=1, min_length=5)),

Page 9: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 9

(’transform’,FeatureUnion([

(’acf’,RowwiseTransformer(FunctionTransformer(func=acf_coefs,

validate=False))),

(’ps’,RowwiseTransformer(FunctionTransformer(func=powerspectrum,

validate=False)))

])),

(’tabularise’, Tabulariser()),

(’clf’, DecisionTreeClassifier())

]

base_estimator = Pipeline(steps)

rise = TimeSeriesForestClassifier(base_estimator=base_estimator,

n_estimators=100)

def acf_coefs(x, maxlag=100):

x = np.asarray(x).ravel()

nlags = np.minimum(len(x) - 1, maxlag)

return acf(x, nlags=nlags).ravel()

def powerspectrum(x, **kwargs):

x = np.asarray(x).ravel()

fft = np.fft.fft(x)

ps = fft.real * fft.real + fft.imag * fft.imag

return ps[:ps.shape[0] // 2].ravel()

The composite version requires the definition of the acf and power spectrumfunctions. We will make this better encapsulated in the next release. To validatethese implementations, we again benchmark against the tsml Java version, usingRISE with 50 trees. Using the built in numpy functions for sktime2 creates aproblem with some datasets: if a zero variance interval is passed to the numpyfunctions, an exception is thrown. In sktime1 and tsml we can manually adjustfor this circumstance. The nature of the composite sktime2 makes this harderto manage. This means we are only able to compare on 68 datasets. Figures 8, 9and 10 show there is no significant difference in accuracy between the threeclassifiers. However, there is greater variance in the difference in accuracy thanobserved for TSF.

The timing comparison for RISE, shown in Figure 11, highlights two curiouscharacteristics. Firstly, sktime2 is an order of magnitude faster than both sktime1and tsml. We believe this is due to to the efficient numpy implementations used,although it merits further investigation. Secondly, tsml has much higher variancethan the sktime implementations, and seems to take an unexpectedly long timeon some small problems. This also merits further investigation.

Page 10: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

10 A. Bagnall et al.

3 2 1

1.8456 tsml2.0588 sktime2

2.0956sktime1

Fig. 8. Average ranks for three RISE implementations. Two from sktime (sktime1using method 1 and sktime2 using method 2) and one from tsml. There is no significantdifference between the three classifiers (as tested with a pairwise wilcoxon test withHolm correction, α = 0.05).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sktime1 ACC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

sktim

e2 A

CC

sktime2is better here

sktime1is better here

Fig. 9. RISE sktime2 vs sktime1 scatterplot of test accuracies. sktime2Wins/Draw/Losses: 34/5/29

5 Dictionary Based: BOSS [23]

The BOSS implementation contains two components, BOSSIndividual and theBOSSEnsemble (what we refer to as BOSS). Individual BOSS classifiers create ahistogram of words from each series using the SFA transform over sliding win-dows. A one nearest neighbour classifier using a bespoke BOSS distance is thenused for classification. The BOSSEnsemble is an ensemble of BOSSIndividual

classifiers. The ensemble members are selected through a grid search of individ-

Page 11: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 11

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sktime2 ACC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

tsm

l AC

C

tsmlis better here

sktime2is better here

Fig. 10. RISE tsml vs sktime2 scatterplot of test accuracies. sktime2Wins/Draw/Losses: 25/5/38

ual BOSS parameters, with classifiers below 92% accuracy of the best classifierremoved.

We also implement a faster and more configurable version of the classifiercBOSS [21], replacing the grid search with a filtered random selection. Thisalternative method can be employed with the BOSSEnsemble by setting therandomised ensemble parameter to true. Two parameters are associated withthis version: the number of individual BOSS classifiers to be built n parameter samples

and the maximum number of classifiers in the ensemble max ensemble size.cBOSS is also contractable, allowing a unit of time in minutes using the time limitparameter to replace the n parameter samples.

Examples of how to construct BOSS and cBOSS are demonstrated in thefollowing code sample.

Page 12: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

12 A. Bagnall et al.

ItalyP

ower

Deman

dUM

D

Freez

erSm

allTra

inPlan

e

Mall

at

Inlin

eSka

te

Swedish

Leaf

PigCVP

SemgH

andG

ende

rCh2

Mixe

dSha

pesR

egula

rTra

in

Datasets ordered by average train time, D=68

102

103

104

105

106

107

Tim

e, tr

ain

(ms)

data1data2data3

Fig. 11. Train time for three RISE classifiers: sktime1 (data2); tsml (data3); and sk-time2 (data1).

import sktime.classifiers.dictionary_based.boss as db

# By default the original BOSS algorithm is setup

boss = db.BOSSEnsemble()

# Configuration for recommended cBOSS settings

c_boss = db.BOSSEnsemble(randomised_ensemble=True,

n_parameter_samples=250, max_ensemble_size=50)

# cBOSS contracted for 1 hour, input time must be in minutes

c_boss_contract = db.BOSSEnsemble(randomised_ensemble=True,

time_limit=60, max_ensemble_size=50)

Our primary goal is to test the correctness and efficiency of the standardimplementations. To this end, we compare the implementations of BOSS in sk-time and tsml. Figure 17 plots the accuracy of the two classifiers on 92 datasets.The accuracies are very similar (63 are identical and the differences are insignifi-cant. We expected less variation in accuracy with the nearest neighbour classifierthan the tree based classifiers, because there are fewer stochastic elements. This

Page 13: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 13

is indeed the case. However, the timing results displayed in Figure 18 show howmuch slower the sktime implementation is: approximately 10-40 times slowerthan tsml. This was also not unexpected. The implementation is mapped di-rectly from Java and does not exploit any of the efficiency improvements thatcan be used in python. This is required future work.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sktime ACC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ts

ml A

CC

tsmlis better here

sktimeis better here

Fig. 12. BOSS sktime vs tsml scatterplot of test accuracies. sktime Wins/Draw/Losses:12/63/12.

6 Shapelet Based: Shapelet Transform Classifier(STC) [11]

The shapelet transform classifier (ShapeletTransformClassifier in file stc.py)is a single pipeline of a shapelet transform (transformers/shapelets.py) anda classifier (by default, a random forest with 500 trees). The original shapelettransform performed a complete enumeration of the shapelet space, and used aheterogeneous ensemble for a classifier [11]. More recent research has shown thatequivalent accuracy can be obtained through randomly searching for shapeletsrather than a full enumeration [4]. The shapelet transform in sktime is con-tractable, in that you can specify an amount of time to search for shapeletsbefore performing the transform. The default is 300 minutes. The latest STCin tsml uses a rotation forest classifier, since it has been shown to be best forthis type of problem [2]. Rotation forest is not yet available in python (we areworking on an alpha version), hence we use random forest for our comparison.

Page 14: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

14 A. Bagnall et al.

ItalyP

ower

Deman

d

ECG200

Beetle

Fly

Midd

lePha

lanxT

W

Freez

erReg

ularT

rain

Inse

ctEPGReg

ularT

rain

OSULeaf

Rock

UWav

eGes

ture

Libra

ryZ

Ethan

olLev

el

Datasets ordered by average train time, D=92

102

103

104

105

106

107

108

Tim

e, tr

ain

(ms)

data1data2

Fig. 13. Train time for two BOSS classifiers: sktime (data2) and tsml (data1).

We run STC in sktime and tsml with a 1000 minute time limit and use thesklearn amd Weka random forest implementations as the final classifier. Givenboth classifiers are contracted, we only compare the accuracies. Figure 14 showsthat, overall, there is no significant difference between the two implementations.However, there is still some considerable variation. This may be down to eval-uation over a single resample, or related to core differences in the underlyingimplementation. Further investigation is required.

7 Distance Based: Elastic Ensemble (EE) and ProximityForest (PF)

7.1 Elastic Ensemble [16]

The Elastic Ensemble (EE) [16] ensembles 11 different time series distance mea-sures each coupled with a 1-nearest-neighbour (NN) classifier, together knownas the constituents of EE. The variation in distance measure of each constituentinjects diversity into the ensemble and provides superior classification perfor-mance over any individual constituent alone. EE uses the following distancemeasures: Euclidean distance, Dynamic Time Warping (DTW), Derivative DTW(DDTW) [15], DTW with cross-validated warping window (DTWCV) [22], DDTWwith cross-validated warping window (DDTWCV), Lowest Common Sub-Sequence

Page 15: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 15

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sktime ACC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

tsm

l AC

C

tsmlis better here

sktimeis better here

Fig. 14. STC sktime vs tsml scatterplot of test accuracies. sktime Wins/Draw/Losses:48/11/52

(LCSS) [12], Edit Distance with Real Penalty (ERP) [5], Move-Split-Merge(MSM) [26], and Time Warp Edit Distance (TWED) [20]. The latter eight ofthese distance measures benefit from hyper-parameter tuning. By default, EEemploys a tuning process for each constituent over 100 parameter options usingleave-one-out-cross-validation (LOOCV).

Comparing long time-series using distance measures can be time consuming,therefore both a cython and python version are provided in sktime. These are indistances.elastic.py and distances.elastic cython.pyx. Further, sktimeprovides an extension to scikit-learn’s KNeighborsClassifier to enable usagewith pandas dataframes and custom distance measures. The tuning of each con-stituent of EE can be time consuming also, therefore we provide capability toreduce the number of neighbours used in examination of parameter options andreduce the number of parameters per constituent overall. Both of these havebeen shown to significantly speed-up EE with no significant loss in classificationperformance.

Page 16: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

16 A. Bagnall et al.

from sktime.classifiers.distance_based.elastic_ensemble

import ElasticEnsemble

from sktime.classifiers.distance_based.time_series_neighbors

import KNeighborsTimeSeriesClassifier

from sklearn.model_selection

import GridSearchCV,

LeaveOneOut

# 1NN classifier with full DTW as the distance measure

dtw_nn = KNeighborsTimeSeriesClassifier(metric=’dtw’,

n_neighbors=1)

# 1NN classifier tuned for best accuracy over 100 DTW

# warping window options

dtwcv_nn = GridSearchCV(

estimator=KNeighborsTimeSeriesClassifier(metric=’dtw’,

n_neighbors=1),

param_grid={’metric_params’:

[{’w’: x / 100} for x in range(0, 100)]}

cv=LeaveOneOut(),

scoring=’accuracy’

)

# Default EE with full tuning effort

ee = ElasticEnsemble()

# Configuration for reducing tuning efforts of the EE

ee = ElasticEnsemble(proportion_of_param_options = 0.5,

proportion_of_train_in_param_finding = 0.1)

Even with cython, the full Elastic Ensemble is very slow in both absolute andrelative terms. In practice, we never run it as a single classifier. Rather, we buildeach constituent in parallel and ensemble after each is complete.

7.2 Proximity Forest

Proximity Forest (PF) [19] is a competitor to EE and implements a forest of de-cision trees, known as Proximity Trees (PT), in a Random Forest like structure.A Proximity Tree (PT) partitions data by comparing the relative similarity ofinstances against a set of exemplar instances. At any given node in the PT, aone exemplar instance is randomly picked per class from the data passed downfrom the parent node. The data is partitioned based upon the similarity of datainstances against exemplar instances, grouping similar instances. Each group ofinstances, along with the corresponding exemplar, are passed down as input data

Page 17: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 17

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sktime ACC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

tsm

l AC

C

tsmlis better here

sktimeis better here

Fig. 15. EE sktime vs tsml scatterplot of test accuracies.

Chinat

own

SonyA

IBORob

otSur

face

1

ECGFiveDay

sBM

E

Diatom

SizeRed

uctio

n

Coffe

e

BirdChic

ken

Beetle

Fly

ArrowHea

d

ECG200

Datasets ordered by average train time, D=16

103

104

105

106

107

108

109

Tim

e, tr

ain

(ms)

data1data2

Fig. 16. Train time for two EE classifiers: sktime (data2) and tsml (data1).

Page 18: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

18 A. Bagnall et al.

to a child node and the process repeats until leaf nodes are pure. This processis known as splitting.

The similarity of instances against exemplar instances is determined usingdistance measures. PF uses the same distance measures and corresponding pa-rameter ranges as EE. Each split in a PT randomly chooses the distance measureand corresponding parameter set if required. This randomness can lead to poorsplits, therefore a hyperparameter r controls the evaluation of multiple splits ateach node, choosing the best split based upon gini score. Each split can be seenas a PT of height 1, otherwise known as a Proximity Stump (PS).

The PF generates a set number of PTs and combines their predictions usinga majority vote. The extreme randomisation across the PF produces classifierwhich can be built in a fraction of the time of EE and produce competitiveprediction performance.

from sktime.classifiers.distance_based.proximity_forest

import ProximityForest,

ProximityStump,

ProximityTree

# Proximity Stump (PT with 1 level of depth)

ps = ProximityStump()

# Proximity Tree with 5 split evaluations per tree node

pt = ProximityTree(n_stump_evaluations = 5)

# Proximity Forest with 100 trees and

# 5 split evaluations per tree node

pf = ProximityForest(n_trees = 100, n_stump_evaluations = 5)

8 Conclusions and Future Directions

Developing time series classification algorithms is an active research field. Ourgoal in providing sktime is to make researching this field easier and more trans-parent. If researchers use a common framework, reproducability and comparisonto the current state of the art becomes easy. In this way we hope to drive thefield forward to help automatically answer the key questions for classifying timeseries: which is the best algorithm for the task at hand.

The classification component of sktime has broad functionality, but there ismuch that could be added and improved. Our work plan for the medium termis as follows.

Page 19: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 19

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

sktimePF ACC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

tsm

lPF

AC

C

tsmlPFis better here

sktimePFis better here

Fig. 17. PF sktime vs tsml scatterplot of test accuracies.

Chinat

own

Mot

eStra

in

ECGFiveDay

s

GunPoin

tOldV

ersu

sYou

ng

ToeSeg

men

tatio

n1

Power

Cons

ToeSeg

men

tatio

n2

Freez

erReg

ularT

rain

Ham

Earth

quak

es

Datasets ordered by average train time, D=32

102

104

106

108

Tim

e, tr

ain

(ms)

data1data2

Fig. 18. Train time for two PF classifiers: sktime (data2) and tsml (data1).

Page 20: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

20 A. Bagnall et al.

8.1 Improved Classifier Functionality

It has taken a surprising amount of effort to get this far, but there is still much todo. This exercise has highlighted several issues with sktime that need resolvingbefore the next release to master.

1. interval based: why is the composite version so much slower than the others?2. frequency based: can we speed up RISE through matrix operations without

exception being continually thrown?3. dictionary based: can we speed up BOSS through a more pythonesque im-

plementation?4. shapelet based: can we improve accuracy by using rotation forest and im-

proving the search efficiency through cython?5. Can we improve EE performance through restricting the parameter search?

We also intend to improve the functionality by making all classifiers contractableand allow for efficient estimation of the test error from the train data. In additionto improving the existing set of classifiers in these ways, we intend porting inrecently proposed alternative TSC algorithms such as FastEE, TS-CHIEF [25],WEASEL [24], Shapelet Forest [14] and HIVE-COTE [17]. We have implementedthe range of distance functions and kernels described in [1] and the MatrixProfile distance MPDist [9]. We will evaluate these classifiers to examine howthey perform in relation to those already implemented within the toolkit.

8.2 Handle Unequal Length Series

One of the primary motivators for our data model of pandas of series objects wasto facilitate the easy incorporation of unequal length time series as classificationproblems. The current suite of available algorithms needs to be adjusted toallow for this use case. For some algorithms this will be fairly simple: BOSS, forexample, can simply normalise the histograms and shapelets are independent ofseries length. However, these design choices may have an impact on accuracyand experimentation using the enhancements will be required.

8.3 Multivariate Time Series Classifiers

sktime already provides two composition interfaces for solving multivariate timeseries classification problems, including

– the ColumnConcatenator, a transformer that can be used to concatenatetwo or more time series columns into a single long time series column, sothat one can then apply a classifier to the concatenated, univariate data,and

– the ColumnEnsembleClassifier, a meta-estimator that allows for column-wise ensembling in which a separate classifier is fitted for each time seriescolumn and their predictions are aggregated.

Page 21: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

Time series classification in sktime 21

We have started to implement bespoke methods for specific estimators to handlemultivariate time series input data and we aim to add more bespoke methodsin future work, e.g. finding shapelets in multidimensional space or extendingtime series forest to extract features from multidimensional segments (or hyper-planes). There are also a multitude of bespoke methods for classifying time series.We shall draw up a candidate

8.4 Time series regression

The composite structure of many time series classifiers allows us to refactorthem into their regressor counterparts. We have started implementing regressionalgorithms, including a TSF regressor and are currently working on adding moreregressors.

Acknowledgements

We would like to thank sktime contributors who are not specific authors on thispaper and our colleagues at UCR with whom we maintain the time series datarepositories.

References

1. A. Abanda, U. Mori, and J. Lozano. A review on distance based time seriesclassification. Data Mining and Knowledge Discovery, 33(2):378–412, 2019.

2. A. Bagnall, A. Bostrom, G. Cawley, M. Flynn, J. Large, and J. Lines. Is rotationforest the best classifier for problems with continuous features? ArXiv e-prints,arXiv:1809.06705, 2018.

3. A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh. The great time seriesclassification bake off: a review and experimental evaluation of recent algorithmicadvances. Data Mining and Knowledge Discovery, 31(3):606–660, 2017.

4. A. Bostrom and A. Bagnall. Binary shapelet transform for multiclass time seriesclassification. Transactions on Large-Scale Data and Knowledge Centered Systems,32:24–46, 2017.

5. L. Chen and R. Ng. On the marriage of Lp-norms and edit distance. In Proc. 30thInternational Conference on Very Large Databases (VLDB), 2004.

6. H. Dau, A. Bagnall, K. Kamgar, M. Yeh, Y. Zhu, S. Gharghabi, and C. Ratanama-hatana. The UCR time series archive. ArXiv e-prints, arXiv:1810.07758, 2018.

7. J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journalof Machine Learning Research, 7:1–30, 2006.

8. H. Deng, G. Runger, E. Tuv, and M. Vladimir. A time series forest for classificationand feature extraction. Information Sciences, 239:142–153, 2013.

9. S. Gharghabi, S. Imani, A. Bagnall, A. Darvishzadeh, and E. Keogh. MatrixProfile XII: MPDist: A novel time series distance measure to allow data mining inmore challenging scenarios. In Proc. 18th IEEE International Conference on DataMining, 2018.

10. T. Gorecki and M. Luczak. Using derivatives in time series classification. DataMining and Knowledge Discovery, 26(2):310–331, 2013.

Page 22: arXiv:1909.05738v3 [cs.LG] 7 Oct 2019 · are grouped based on their core transformation technique. The taxonomy of classi ers is taken from the comparative study presented in [3]

22 A. Bagnall et al.

11. J. Hills, J. Lines, E. Baranauskas, J. Mapp, and A. Bagnall. Classification oftime series by shapelet transformation. Data Mining and Knowledge Discovery,28(4):851–881, 2014.

12. D. Hirschberg. Algorithms for the longest common subsequence problem. Journalof the ACM, 24(4):664–675, 1977.

13. Y. Jeong, M. Jeong, and O. Omitaomu. Weighted dynamic time warping for timeseries classification. Pattern Recognition, 44:2231–2240, 2011.

14. I. Karlsson, Papapetrou P, and H. Bostrom. Generalized random shapelet forests.Data Mining and Knowledge Discovery, 30(5):1053–1085, 2016.

15. E. Keogh and M. Pazzani. Derivative dynamic time warping. In Proc. 1st SIAMInternational Conference on Data Mining (SDM), 2001.

16. J. Lines and A. Bagnall. Time series classification with ensembles of elastic distancemeasures. Data Mining and Knowledge Discovery, 29:565–592, 2015.

17. J. Lines, S. Taylor, and A. Bagnall. Time series classification with HIVE-COTE:The hierarchical vote collective of transformation-based ensembles. ACM Trans.Knowledge Discovery from Data, 12(5), 2018.

18. Markus Loning, Anthony Bagnall, Sajaysurya Ganesh, Viktor Kazakov, JasonLines, and Franz J. Kiraly. A unified interface for machine learning with timeseries. ArXiv e-prints, arXiv:1909.07872, 2019.

19. B. Lucas, A. Shifaz, C. Pelletier, L. ONeill, N. Zaidi, B. Goethals, F. Petitjean,and G. Webb. Proximity forest: an effective and scalable distance-based classifierfor time series. Data Mining and Knowledge Discovery, 33(3):607–635, 2019.

20. P. Marteau. Time warp edit distance with stiffness adjustment for time seriesmatching. IEEE Transactions on Pattern Analysis and Machine Intelligence,31(2):306–318, 2009.

21. M. Middlehurst, W. Vickers, and A. Bagnall. Scalable dictionary classifiers fortime series classification. arXiv preprint arXiv:1907.11815, 2019.

22. C. Ratanamahatana and E. Keogh. Three myths about dynamic time warpingdata mining. In Proc. 5th SIAM International Conference on Data Mining, 2005.

23. P. Schafer. The BOSS is concerned with time series classification in the presenceof noise. Data Mining and Knowledge Discovery, 29(6):1505–1530, 2015.

24. P. Schafer and U. Leser. Fast and accurate time series classification with weasel.In Proceedings of the 2017 ACM on Conference on Information and KnowledgeManagement, pages 637–646. ACM, 2017.

25. A. Shifaz, C. Pelletier, F. Petitjean, and G. Webb. TS-CHIEF: A scalableand accurate forest algorithm for time series classification. ArXiv e-prints,arXiv:1906.10329, 2019.

26. A. Stefan, V. Athitsos, and G. Das. The Move-Split-Merge metric for time series.IEEE Transactions on Knowledge and Data Engineering, 25(6):1425–1438, 2013.