1 fast calculations of simple primitives in time series dennis shasha department of computer science...

1

Fast Calculations of Simple Primitives in Time Series

Dennis ShashaDepartment of Computer Science

Courant Institute of Mathematical SciencesNew York university

Joint work with Richard Cole, Xiaojian Zhao (correlation), Zhihua Wang (humming), Yunyue Zhu (both), and Tyler Neylon (svds, trajectories)

2

Roadmap

Section 1 : MotivationSection 2 : Statstream: A Fast Sliding Window based Correction

Detector Problem Statement Cooperative and Uncooperative Time Series Algorithmic Framework DFT based Scheme and Random Projection Combinatorial Design and Bootstrapping Empirical Study

Section 3 : Elastic Burst Detection Problem Statement Challenge Shifted Binary Tree Astrophysical Application

3

Overall Motivation Financial time series streams are watched closely by millions of

traders. What exactly do they look for and how can we help them do it faster? Typical query:“Which pairs of stocks had highly correlated returns over hich pairs of stocks had highly correlated returns over

the last three hoursthe last three hours?” Physicists study the time series emerging from their sensors.

Typical query:“Do there exist bursts of gamma rays in windows of any size Do there exist bursts of gamma rays in windows of any size from 8 milliseconds to 4 hoursfrom 8 milliseconds to 4 hours?”

Musicians produce time series. Typical query: “Even though I can’t hum well, please find this song. I

want the CD.”

4

Why Speed Is Important

As processors speed up, algorithmic efficiency no longer matters … one might think.

True if problem sizes stay same but they don’t.

As processors speed up, sensors improve --satellites spewing out a terabyte a day, magnetic resonance imagers give higher resolution images, etc.

Desire for real time response to queries./86

5

Surprise, surprise

More data, real-time response, increasing importance of correlation IMPLIES

Efficient algorithms and data management more important than ever!

/86

6

Section 2:

Statstream: A Fast Sliding Window based Correction Detector

7

Scenario Stock prices streams

The New York Stock Exchange (NYSE) 50,000 securities (streams); 100,000 ticks (trade and quote)

Pairs Trading, a.k.a. Correlation Trading Query:“which pairs of stocks were correlated with a value of over 0.9

for the last three hours?”XYZ and ABC have been correlated with a correlation of 0.95 for the last three hours.Now XYZ and ABC become less correlated as XYZ goes up and ABC goes down.They should converge back later.I will sell XYZ and buy ABC …

8

Motivation: Online Detection of High Correlation

Correlated!

Correlated!

9

Problem Statement

Synchronous time series window correlation Given Ns streams, a start time tstart,and a window size w, find, for each

time window W of size w, all pairs of streams S1 and S2 such that S1 during time window W is highly correlated with S2 during the same time window.

(Possible time windows are [tstart tstart+w - 1], [tstart+1 tstart+w], where tstart is some start time.)

Asynchronous correlation Allow shifts in time. That is, given Ns streams and a window size w, find all time windows W1

and W2 where |W1 |= |W2 |= w and all pairs of streams S1 and S2 such that S1 during W1 is highly correlated with S2 during W2.

10

Cooperative and Uncooperative Time Series

Cooperative time series Exhibit a fundamental degree of regularity at least over the short

term, Allow long time series to be compressed to a few coefficients with

little loss of information using data reduction techniques such as Fourier Transforms and Wavelet Transforms.

Example: stock price time series

Uncooperative time series Regularities are absent – resembles noise. Example: stock return time series (difference in price/avg price)

11

Algorithmic Framework

Basic Definitions: Timepoint:

The smallest unit of time over which the system collects data, e.g., a second.

Basic window: A consecutive subsequence of time points over which the system

maintains a digest (i.e., a compressed representation) and returns resuls e.g., two minutes.

Sliding window: A user-defined consecutive subsequence of basic windows over

which the user wants statistics, e.g., an hour. The user might ask, “which pairs of streams were correlated with a

value of over 0.9 for the last hour?” Then again 2 minutes later.

12

Definitions: Sliding window and Basic window

……Stock 1

Stock 2

Stock 3

Stock n

Sliding window

Time axis

Sliding window size=8

Basic window size=2

Basic window Time point

13

Algorithmic Strategy (cooperative case)

Dimensionality Reduction

(DFT, DWT, SVD)

time series 1

time series 2

time series 3

…

time series n

…

digest 1

digest 2

…

digest n

…

Grid structure

Correlatedpairs

14

GEMINI framework (Faloutsos et al.)

Transformation ideally has lower-bounding property

15

DFT based Scheme*

Sliding window

Basic window digests:

sum

DFT coefs

Basic window

Time point

Basic window digests:

sum

DFT coefs

*D. Shasha and Y. Zhu. High Performance Discovery in Time Series: Techniques and Case Studies. Springer, 2004.

16

Incremental Processing

Compute the DFT a basic window at the time. Then add (with angular shifts) to get a DFT for the whole

sliding window. Time is just DFT time for basic window + time proportional to number of DFT components we need.

Using the first few DFT coefficients for the whole sliding window, represent the sliding window by a point in a grid structure.

End up having to compare very few time windows, so a potentially quadratic comparison problem becomes linear in practice.

17

Grid Structure

),...,( 21 kxxxx

18

Problem: Doesn’t always work

DFT approximates the price-like data type very well. However, it is poor for stock returns (today’s price – yesterday’s price)/yesterday’s price.

Return is more like white noise which contains all frequency components.

DFT uses the first n (e.g. 10) coefficients in approximating data, which is insufficient in the case of white noise.

19

Random Walk

0

0.2

0.4

0.6

0.8

1

1.2

1 6

11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

10

1

The number of coefficients

Rat

io o

ver

tota

l ene

rgy

ratio

White Noise

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

The number of coefficients

Rat

io o

ver t

otal

ene

rgy

ratio

DFT on random walk (works well) and white noise (works badly)

20

Random Projection: Intuition

You are walking in a sparse forest and you are lost. You have an outdated cell phone without a GPS. You want to know if you are close to your friend. You identify yourself as 100 meters from the pointy rock

and 200 meters from the giant oak etc. If your friend is at similar distances from several of these

landmarks, you might be close to one another. Random projections are analogous to these distances to

landmarks.

21

How to compute a Random Projection*

Random vector pool: A list of random vectors drawn from stable distribution (like the landmarks)

Project the time series into the space spanned by these random vectors The Euclidean distance (correlation) between time series is

approximated by the distance between their sketches with a probabilistic guarantee.

Note: Sketches do not provide approximations of individual time series window but help make comparisons.

•W.B.Johnson and J.Lindenstrauss. “Extensions of Lipshitz mapping into hilbert space”. Contemp. Math.,26:189-206,1984

),...,3,2,1(1 1111 wrrrrR

),...,3,2,1(2 2222 wrrrrR ),...,3,2,1(3 3333 wrrrrR ),...,3,2,1(4 4444 wrrrrR

)4,3,2,1( xskxskxskxsk

)4,3,2,1( yskyskyskysk

inner product

random vector sketchesraw time series

Random Projection

),...,,( 321 wxxxxx

),...,,( 321 wyyyyy

X’ current position

Y’ current position

Rocks, buildings…

Y’ relative distances

X’ relative distances

23

Sketch Guarantees

Johnson-Lindenstrauss Lemma: For any and any integer n, let k be a positive integer such that

Then for any set V of n points in , there is a map such that for all

Further this map can be found in randomized polynomial time

10

nk ln)3/2/(4 132 dR kd RRf :

Vvu ,222 ||||)1(||)()(||||||)1( vuvfufvu

24

Empirical Study: sketch distance/real distance

Factor distribution

0%

1%

2%

3%

4%

5%

Factor(Real distance/Sketch distance)

Per

cent

age

of d

ista

nce

number

Factor distribution

0%

1%

2%

3%

4%

5%

6%

7%

1.25

1.20

1.16

1.12

1.09

1.05

1.02

0.99

0.96

0.93

0.91

0.88


Per

cent

age

of d

ista

nce

number

Sketch=30

Sketch=80

Factor distribution

0%

2%

4%

6%

8%

10%

12%

1.19

1.16

1.14

1.11

1.09

1.06

1.04

1.02

1.00

0.98

0.96

0.94

0.93


Per

cent

age

of d

ista

nce

number

Sketch=1000

25

Empirical Comparison : DFT, DWT and Sketch

Stock Return Data

0

5

10

15

20

25

30

1 66

131

196

261

326

391

456

521

586

651

716

781

846

911

976

Data Points

Dis

tan

ce

dft

dwt

sketch

dist

26

Algorithm overview using random projections/sketches

Partition each sketch vector s of size N into groups of some size g;

The ith group of each sketch vector s is placed in the ith grid structure (of dimension g).

If two sketch vectors s1 and s2 are within distance cd, where d is the target distance, in more than a fraction f of the groups, then the corresponding windows are candidate highly correlated windows and should be checked exactly.

27

Optimization in Parameter Space

Next, how to choose the parameters g, c, f, N?

Size of Sketch (N): 30, 36, 48, 60Group Size (g): 1, 2, 3, 4Distance Multiplier (c): 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3Fraction (f): 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1

28

Optimization in Parameter Space

Essentially, we will prepare several groups of good parameter candidates and choose the best one to be applied to the given data

But, how to select the good candidates? Combinatorial Design (CD) Bootstrapping

29

Combinatorial Design

The pair-wise combinations of all the parametersInformally: Each value of parameter X will be combined

with each value of parameter Y in at least one experiment, for all X, Y

Example: if there are four parameters having respectively 4, 4, 13, and 10 values, exhaustive search requires 2080 experiments vs. 130 for pair-wise combinatorial design. sketchparam.csv

*http://www.cs.nyu.edu/cs/faculty/shasha/papers/comb.html

30

Exploring the Neighborhood around the Best Values

Because combinatorial design is NOT exhaustive, we may not find the optimal combination of parameters at first.

Solution: When good parameter values are found, their local neighbors will be searched further for better solutions

31

How Bootstrapping Is Used

Goal: Test the robustness of a conclusion on a sample data set by creating new samples from the initial sample with replacement.

Procedure: A sample set with 1,000,000 pairs of time series

windows. Among them, choose with replacement 20,000

sample points Compute the recall and precision each time Repeat many times (e.g. 100 or more)

32

Testing for stability

Bootstrap 100 times Compute mean and std of recalls and precisions What we want from good parameters

Mean(recall)-std(recall)>Threshold(recall)

Mean(precision)-std(precision)>Threshold(precision)

If there are no such parameters, enlarge the replacement sample size

Inner product with random vectors r1,r2,r3,r4,r5,r6

),,,,,( 654321 xskxskxskxskxskxsk

),,,,,( 654321 yskyskyskyskyskysk

),,,,,( 654321 zskzskzskzskzskzsk

X Y Z

),( 21 xskxsk

),( 21 yskysk

),( 21 zskzsk

),( 43 xskxsk

),( 43 yskysk

),( 43 zskzsk

),( 65 xskxsk

),( 65 yskysk

),( 65 zskzsk

Grid structure

DFT Distance/Real Distance

00.20.40.60.8

11.21.41.61.8

cstr ee

gwind

buoy

_sen

sor

evap

orato

r

foeta

l_ecg

spot_

exra

tes

steam

gen

winding

price

retur

n

Practical Data Sets

Rat

io std_dft

mean_dft

Sketch Distance/Real Distance

0

0.5

1

1.5

2

2.5

Practical Data Sets

rati

o std_sketch

mean_sketch

36

Experiments

Comparison of Processing Time

0

0.2

0.4

0.6

0.8

1

1.2

Practical Data Sets

Wal

l Clo

ck T

ime(

seco

nd)

sketch

dft

scan

37

Section 3: Elastic Burst Detection

38

Elastic Burst Detection: Problem Statement

Problem: Given a time series of positive numbers x1, x2,..., xn, and a threshold function f(w), w=1,2,...,n, find the subsequences of any size such that their sums are above the thresholds: all 0<w<n, 0<m<n-w, such that xm+ xm+1+…+ xm+w-1 ≥

f(w) Brute force search : O(n^2) time Our shifted binary tree (SBT): O(n+k) time.

k is the size of the output, i.e. the number of windows with bursts

39

Burst Detection: Challenge

Single stream problem. What makes it hard is we are looking at multiple

window sizes at the same time. Naïve approach is to do this one window size at a

time.

40

Astrophysical Application

Motivation: In astrophysics, the sky is constantly observed for high-energy particles. When a particular astrophysical event happens, a shower of high-energy particles arrives in addition to the background noise. An unusual event burst may signal an event interesting to physicists.

Technical Overview:1.The sky is partitioned into 1800*900 buckets.2.14 Sliding window lengths are monitored from 0.1s to 39.81s 3.The original code implements the naive window-at-a-time algorithm. Can’t do more windows. 1800

900

41

Bursts across different window sizes in Gamma Rays

Challenge : to discover not only the time of the burst, but also the duration of the burst.

42

Shifted Binary Tree (SBT)

Define threshold for node for size 2k to be threshold for window of size 1+ 2k-1

43

Burst Detection using SBT

Any window of size w, 2i-1+2 w 2i+1, is included in one of the windows at level i+1.

For non-negative data stream and a monotonic aggregation function, if a node at level i+1 doesn’t exceed the threshold for window size 2i-

1+2, none of the windows of sizes between 2i-

1+2 and 2i+1 will contain a burst; otherwise need detailed search to test for real bursts

Filter many windows, thus reducing the CPU time dramatically

Shortcoming: fixed structure. Can do badly if bursts very unlikely or relatively likely.

44

Shifted Aggregation Tree Hierarchical tree structure-each node is

an aggregate Different from the SBT in two ways:

￭ Parent-child structure:

Define the topological relationship between a node and its children

￭ Shifting pattern:

Define how many time points apart between two neighboring nodes at the same level

45

Aggregation Pyramid (AP)Aggregation Pyramid (AP)

N-level isosceles triangular-N-level isosceles triangular-shaped data structure built shaped data structure built on a sliding window of on a sliding window of length Nlength N

Level 0 has a one-to-one Level 0 has a one-to-one correspondence to the input correspondence to the input time seriestime series

Level h stores the Level h stores the aggregates for h+1 aggregates for h+1 consecutive elements, i.e, a consecutive elements, i.e, a sliding window of length sliding window of length h+1h+1

AP stores every aggregate AP stores every aggregate for every window size for every window size starting at every time point starting at every time point

46

Aggregation Pyramid Property 45o diagonal: same starting time 135o diagonal: same ending

time Shadow of cell(t,h): a sliding

window starting at time t and ending at t+h-1

Coverage of cell(t,h): all the cells in the sub-pyramid rooted at cell(t,h)

Overlap of cell(t1,h1) and cell(t2,h2): a cell at the intersection of the 135o diagonal touching cell(t1,h1) and the 45o diagonal touching cell(t2,h2)

47

Embed Shifted Binary Tree in Embed Shifted Binary Tree in Aggregation PyramidAggregation Pyramid

48

Aggregation Pyramid as a Host Data Aggregation Pyramid as a Host Data StructureStructure

Many structures besides Shifted Binary Tree Many structures besides Shifted Binary Tree in an Aggregation Pyramidin an Aggregation Pyramid

The update-filter-search framework The update-filter-search framework guarantees detection of all the bursts as long guarantees detection of all the bursts as long as the structure includes the level 0 cells and as the structure includes the level 0 cells and the top-level cellthe top-level cell

What kind of structures are good for burst What kind of structures are good for burst detection?detection?

49

Which Shifted Aggregation Tree to be Which Shifted Aggregation Tree to be used?used?

Many Shifted Aggregation Trees Many Shifted Aggregation Trees available, all of them guarantee available, all of them guarantee detection of all the bursts, which detection of all the bursts, which structure to be used?structure to be used?

Intuitively, the denser a structure, the Intuitively, the denser a structure, the more updating time, and the less more updating time, and the less detailed search time, and vice versa.detailed search time, and vice versa.

The structure minimizing the total CPU The structure minimizing the total CPU running time, given the inputrunning time, given the input

50

State-space AlgorithmState-space Algorithm

View a Shifted View a Shifted Aggregation Aggregation Tree (SAT) as a Tree (SAT) as a statestate

View the View the growth from growth from one SAT to one SAT to another as a another as a transformation transformation between statesbetween states

51

State-space AlgorithmState-space Algorithm

Initial state: a Shifted Aggregation Tree (SAT) Initial state: a Shifted Aggregation Tree (SAT) containing only level 0containing only level 0

Transformation rule: If by adding one level to the Transformation rule: If by adding one level to the top of SAT B, we get SAT A, state B is top of SAT B, we get SAT A, state B is transformed to state Atransformed to state A

Final state: a SAT which can cover the max Final state: a SAT which can cover the max window size of interestwindow size of interest

Traversing strategy: best-first searchTraversing strategy: best-first search Associate each state with a costAssociate each state with a cost

Prune: to explore more reasonable structuresPrune: to explore more reasonable structures

52

Results Shifted Aggregation Shifted Aggregation

Tree outperforms Tree outperforms Shifted Binary Tree. Shifted Binary Tree. A factor of 35 A factor of 35

times speedup in times speedup in some experimentsome experiment

Shifted Aggregation Shifted Aggregation Tree can adjust its Tree can adjust its structure to adapt structure to adapt different inputs.different inputs.

CPU Time vs. l - Poisson

0

5000

10000

15000

20000

0.001 0.01 0.1 1 10 100 1000l

CP

U T

ime (m

s)

SAT

SBT

Naive

CPU Time vs. Threshold - Poisson

0

10000

20000

30000

40000

2 3 4 5 6 7 8 9 10

Burst Probability p=10-k

CP

U T

ime (m

s)

SAT

SBT

53

Greedy Dynamic Burst DetectionGreedy Dynamic Burst Detection

Real world data keeps changingReal world data keeps changing Poor if the training data differs significantly from the Poor if the training data differs significantly from the

data to be detecteddata to be detected 10%-250% more detection time shown in the figure 10%-250% more detection time shown in the figure

belowbelowCPU time using different training sets

0

1000

2000

3000

4000

5000

6000

7000

0.8 0.9 1 1.1 1.2

Training sets with different l

CP

U t

ime (

ms)

static

54

IdeasIdeas

Basic ideas: change a structure if a change helps to Basic ideas: change a structure if a change helps to reduce the overall costreduce the overall cost

Greedy when making a structure denserGreedy when making a structure denser If the saved detailed search cost is greater than the If the saved detailed search cost is greater than the

updating/filtering cost, add this levelupdating/filtering cost, add this level Delayed greedy when making a structure sparserDelayed greedy when making a structure sparser

Alarms likely occur in clusters, across multiple sizes Alarms likely occur in clusters, across multiple sizes and multiple neighboring windowsand multiple neighboring windows

A lower level may support a higher levelA lower level may support a higher level Don’t remove a level if an alarm occurred recentlyDon’t remove a level if an alarm occurred recently

55

AlgorithmAlgorithm Sketch Sketch

Start with a trained structure using the state-space Start with a trained structure using the state-space algorithmalgorithm

If an alarm is raised at level kIf an alarm is raised at level k If adding a level in between level k and level k-1 can If adding a level in between level k and level k-1 can

save some cost, add this levelsave some cost, add this level If can’t add due to some lower level to support this If can’t add due to some lower level to support this

level not in the structure, add the lower levellevel not in the structure, add the lower level If can’t add because that the shift doesn’t satisfy the If can’t add because that the shift doesn’t satisfy the

property of Shifted Aggregation Tree, legally narrow property of Shifted Aggregation Tree, legally narrow the shiftsthe shifts

elseif the aggregate at level k doesn’t exceed the elseif the aggregate at level k doesn’t exceed the minimum threshold for level k-1minimum threshold for level k-1

If no alarm occurred recentlyIf no alarm occurred recently If legal to remove level k-1, remove itIf legal to remove level k-1, remove it else legally double the shiftelse legally double the shift

56

ResultsResults

Different training setsDifferent training sets The dynamic algorithm The dynamic algorithm

overcomes the overcomes the discrepancy resulting from discrepancy resulting from a biased training dataa biased training data

Different sets of window sizesDifferent sets of window sizes When the number of When the number of

window sizewindow sizess is small, the is small, the dynamic algorithm dynamic algorithm performs slightly performs slightly less wellless well than the static algorithmthan the static algorithm

CPU time for different sets of window sizes - Poisson

0

1000

2000

3000

4000

5000

10 25 50 75 100

Number of window sizes

CP

U t

ime

(ms)

Static

Dynamic

CPU time for different training sets vs. dynamic algorithm - Poisson

01000200030004000500060007000

0.8 0.9 1 1.1 1.2

Training Sets with different l

CP

U T

imes

(m

s)

static

dynamic

57

Volume Spike Detection in Stocking Volume Spike Detection in Stocking TradingTrading

Trading vTrading volume indicates buyolume indicates buyinging/sell/sellinging interest, the underlying force for price interest, the underlying force for price movementsmovements

Volume spike: a Volume spike: a burst of trading burst of trading activity, activity, a a signal signal in program in program tradingtrading

High rate: more High rate: more than 100 updates than 100 updates per second per per second per stockstock

(from marketvolume.com)

58

Volume Spike Detection in Stocking Volume Spike Detection in Stocking TradingTrading

SetupSetup Jan. 2001 - May 2004 tick-Jan. 2001 - May 2004 tick-

by-tick trading activities of by-tick trading activities of the IBM stockthe IBM stock

23 million time points23 million time points Exponential distributionExponential distribution

ResultsResults Real time response: Real time response:

0.01ms per time point on 0.01ms per time point on average average

3-5 times 3-5 times speedup overspeedup over Shifted Binary TreeShifted Binary Tree

The dynamic algorithm The dynamic algorithm performs slightly better performs slightly better than the static algorithm.than the static algorithm.

CPU Time vs. Threshold - IBM

0

20000

40000

60000

80000

2 3 4 5 6 7 8 9 10


CP

U T

ime

(ms)

SBT

SAT - static

SAT - dynamic

CPU Time vs. Max Window Size of Interest - IBM

0

50000

100000

150000

200000

250000

300000

10 30 60 120 300 600 1800

Max Window Size of Interest

CP

U T

ime

(ms)

SBT

SAT - static

SAT - dynamic

CPU Time vs. Set of Window Sizes - IBM

0

5000

10000

15000

20000

25000

10 30 60 120 240

Number of Window Sizes

CP

U T

ime

(ms)

SBT

SAT - static

SAT - dynamic

59

Click Fraud Detection in Website Traffic Click Fraud Detection in Website Traffic MonitoringMonitoring

SetupSetup￭ 2003 Sloan Digital Sky Survey 2003 Sloan Digital Sky Survey

web traffic log, same type of web traffic log, same type of data as click datadata as click data

￭ Number of requests within each Number of requests within each secondsecond

￭ 31 million time points31 million time points￭ Poisson distributionPoisson distribution

Results: Results: 2-5 times 2-5 times speedup overspeedup over Shifted Shifted

Binary TreeBinary Tree The dynamic algorithm The dynamic algorithm

performs better than the static performs better than the static algorithmalgorithm..

CPU Time vs. Threshold - SDSS

0

20000

40000

60000

80000

2 3 4 5 6 7 8 9 10


CP

U T

ime

(ms)

SBTSAT - staticSAT - dynamic

CPU Time vs. Set of Window Sizes - SDSS

0

10000

20000

30000

40000

10 30 60 120 240

Number of Window Sizes

CP

U T

ime

(ms)

SBT

SAT-static

SAT-dynamic

CPU Time vs. Max Window Size of Interest - SDSS

0

100000

200000

300000

400000

500000

10 30 60 120 300 600 1800

Max Window Size of Interest

CP

U T

ime

(ms)

SBT

SAT-static

SAT-dynamic

60

Fast and AccurateTime Series Matching with Time-Warping

61

Outline

Problem Statement Related work review Case study: Query-by-Humming Future work

63

Goal of this work

Goal Build fast and accurate similarity

search algorithms for large scale time series system that allow complex time shifting in the query

Two Major Challenges Query ambiguity The large size of the database

64

Related Work Review GEMINI framework

Introduced by C. Faloutsos, M. Ranganathan and Y. Manolopoulos to avoid linear scan comparison

Dynamic Time Warping (DTW) Introduced by D. Berndt and J. Clifford to allow

time-shifting in the time series comparison Envelope and Envelope Transforms

Introduced by E. Keogh to index DTW distance Generalized into GEMINI framework by our

group.

65

Dynamic Time WarpingDTW Distance between two time series x,y is

Equal to optimal path finding

• Each path (1,1)(m,n) is an alignment

• (i,j) represents aligning x(i) with y(j)

• cost(i,j) equals |x(i)-y(j)|

• Optimal path has minimum total cost

Time Series 1

Time Series 2

66

Problem of DTW DistanceDTW Distance does not obey triangle-

inequality.

which most standard indexing methods require.

67

Envelope and Envelope Transform

• Envelope Filter

• Transformed Envelope Filter

Filter out bad candidates with lower computing cost and guarantee no false negative

Feature Space

68

Example of Envelope Transform Piecewise Aggregate Approximation (PAA)

Original time series

Upper envelope

Lower envelope

U_new

L_new

69

Case Study: Query by Humming

Application Specific Challenges Related Work Proposed Framework Experiment

HummingMusic

DatabaseSystem

Similarmusic

segments

query return

70

Challenges

People may hum out of tune Different base key Inaccurate relative pitch Instable pitch Different tempo Varying tempo

Hard to segment notes in the humming

71

Flourishing Literature String-represented note-sequence matching[A. Uitdenboderd et al Matching techniques for large music databases.ACM

Multimedia 99] Data Reduction Transformations (STFT)[N. Kosugi et al A pratical query-by-humming system for a large music

database. ACM Multimedia 2000] Melody slope matching[Yongwei Zhu et al Pitch tracking and melody slope matching for song

retrieval. Advances in Multimedia Information Processing PCM 2001] Dynamic Time Warping (DTW) on pitch contour[Y. Zhu and D. Shasha Warping indexes with envelope transforms for query

by humming. ACM SIGMOD 2003] String-editing on note sequence combined with rhythm

alignment[W. Archer A. Methods for retrieving musical information based on rhythm

and pitch correlation CSGSC 2003]

72

Problems with Related Work

Difficult to do performance comparison No Standard for Evaluation

Data set and test set are different Definition of accuracy is not reliable

General conclusions nevertheless possible: Warped distance measure is more robust Need to scale up warped distance

matching

73

Experiment on Scaling Up

Test set 41 hummed tunes from Beatles songs Collected from both amateurs and professionals Recognizable by more than three persons

Two data setsA. 53 Beatles songs (included by B)B. 1032 songs including 123 Beatles songs, 908 American rock-and-pop

songs and one Chinese game music

Top K Hit-rate=

# recognized in TopK list

# recognized by human

* Both system are optimized using another test set which includes 17 hummings

74

Framework Proposalnote/duration

sequencesegmentnotes

Query criteria

Database

Humming with ‘ta’

keywords

Top Nmatch

Nearest-Nsearch

on DTWdistancewith transformedenvelope filter

melody (note)

TopN’

match

Alignment

verifier

rhythm (duration)& melody (note)

Database

statisticsbasedfeatures

Boostedfeature

filter

boosting

Database

Keyword

filter

75

Important Heuristic: ‘ta’ Based Humming*

Solve the problem of note segmentation in most cases

Compare humming with ‘la’ and ‘ta’

* Idea from N. Kosugi et al “A pratical query-by-humming system for a large music database” ACM Multimedia 2000

76

Benefits of ‘ta’ Style Humming Decrease the size of time series by orders of

magnitude. Thus reduce the computation of DTW distance

77

Important Heuristic: Statistics-Based Filters *

Low dimensional statistics feature Lower computation cost comparing to DTW distance Quickly filter out true negatives

Example Filter out candidates whose note length is much

larger/smaller than the query’s note length More

Standard Deviation of note values Zero crossing rate of note values Number of local minimum/maximum of note values Histogram of note values

* Intuition from Erling Wold et al “Content-based classification, search and retrieval of audio” IEEE Multimedia 1996 http://www.musclefish.com

78

Important Heuristic: Boosting Characteristics of statistics-based filters

Quick but does not guarantee no false negative Becomes weak classifier for bad parameters setting Ideal candidates for boosting

Boosting * “An algorithm for constructing a ‘strong’ classifier

using only a training set and a set of ‘weak’ classification algorithm”

“A particular linear combination of these weak classifiers is used as the final classifier which has a much smaller probability of misclassification”

* Cynthia Rudin et al “On the Dynamics of Boosting” In Advances in Neural Information Processing Systems 2004

79

Important Heuristic: Alignment Verification

Nearest-N search only used melody information, which does not guarantee the rhythm will match

Will A. Arentz et al. suggests combining rhythm and melody information in similarity comparison

Results are generally better than using only melody information

Not appropriate when the sum of several notes’ duration in the query may be related to duration of one note in the candidate

Our method:1. Use melody information for DTW distance computing2. Reject candidates that have bad local note alignment3. Merge durations appropriately based on the note

alignment4. Reject candidates which have bad duration alignment

80

Experiments Data Set

1,032 songs were divided and organized into 73, 051 melodic segments

Computing Environment Pentium IV 2.4G 768M memory: all the data can be loaded K: an array processing language

Query Criteria Return Top 15 list which has DTW distance less than

0.5 Allow 0.05 * N local time shifting for query with N notes

81

Experiment One: Human Humming

Query Set 109 `ta`-style hummings

The previous 41 hummings for Beatles songs 65 hummings for American rock-and-pop song 3 hummings for the Chinese game music

Recognizable by at least two persons Number of notes varies from 6 to 24

82

Future Work Model and estimate the error more accurately

Analyze the relationship between algorithm's performance and observed humming errors

Build a standard benchmark to evaluate and compare different QbH systems

Investigate more lower-bounding filters on lower levels

Investigate more classifiers to boost Build intelligent system

Self-improving by adjusting to a particular user’s humming patterns

83

Themes and Approaches

Our approach: take simple problems and make them fast.

Correlation and related primitives (e.g. matching pursuit) are simple, but we want to do them for many thousands of time series incrementally and in near linear time.

Burst detection over many window sizes in near linear time.

Query by humming: large scale and accurate.Coming to a store near you?

1 fast calculations of simple primitives in time series dennis shasha department of computer science...

Documents

time window w of size

time windows w

svd time series

long time series

basic window time point

start time t

time series n digest

window size w