bridging the gap between applications and tools: modeling multivariate time series

44
Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series X Liu, S Swift & A X Liu, S Swift & A Tucker Tucker Department of Computer Department of Computer Science Science Birkbeck College Birkbeck College University of London University of London

Upload: lacey-roberson

Post on 03-Jan-2016

15 views

Category:

Documents


0 download

DESCRIPTION

Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series. X Liu, S Swift & A Tucker Department of Computer Science Birkbeck College University of London. MTS Applications at Birkbeck. Screening Forecasting Explanation. Forecasting. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

X Liu, S Swift & A TuckerX Liu, S Swift & A TuckerDepartment of Computer ScienceDepartment of Computer Science

Birkbeck CollegeBirkbeck College

University of LondonUniversity of London

Page 2: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series
Page 3: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series
Page 4: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

MTS Applications at Birkbeck

ScreeningScreening

ForecastingForecasting

ExplanationExplanation

Page 5: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Forecasting

Predicting Visual Field Deterioration of Predicting Visual Field Deterioration of Glaucoma PatientsGlaucoma Patients

Function Prediction for Novel Proteins from Function Prediction for Novel Proteins from Multiple Sequence/Structure DataMultiple Sequence/Structure Data

Page 6: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Explanation

Input (observations):

t - 0 : Tail Gas Flow in_state 0t - 3 : Reboiler Temperature in_state 1

Output (explanation):

t - 7 : Top Temperature in_state 0 with probability=0.92t - 54 : Feed Rate in_state 1 with probability=0.71t - 75 : Reactor Temperature in_state 0 with probability=0.65

Page 7: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

The Gaps

ScreeningScreening Automatic / Semi- Automatic Analysis of Automatic / Semi- Automatic Analysis of

OutliersOutliers ForecastingForecasting

Analysing Short Multivariate Time SeriesAnalysing Short Multivariate Time Series ExplanationExplanation

Coping with Huge Search SpacesCoping with Huge Search Spaces

Page 8: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

The Problem - What/Why/How Short-Term Forecasting of Visual Field Progression Using a Statistical MTS Model The Vector Auto-Regressive Process - VAR(P) There Could be Problems if the MTS is Short A Modified Genetic Algorithm (GA) can be Used VARGA

The Prediction of Visual Field Deterioration Plays anImportant Role in the Management of the Condition

Page 9: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Background - The Dataset

The interval between testsis about 6 months

Typically, 76 pointsare measured

The number of tests canrange between 10 and 44

xPoints used in this paper (Right Eye)

Usual Position of Blind Spot (Right Eye)

x

Values Range Between60 =very good, 0 = blind

76 75 18 19

74 73

71

15 16 17

70 69 68

67 66 65

11 12 13 14

64 63

72

6 7 8 9 10

62 61 60 59 58 1 2 3 4 5

43 42 41 40 39 20 21 22 23 24

48 47 46 45 44 25 26 27 28 29

52 51 50 49 30 31 32 33

55 54 53 34 35 36

57 56 37 38

Page 10: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Background - The VAR ProcessVector Auto-Regressive Process of Order P: VAR(P)

x(t) VF Test for Data Points at Time t (K1)Ai Parameter Matrix at Lag i (KK)x(t-i) VF Test for Data Points at lag i from t (K1) (t) Observational Noise at time t (K1)

Page 11: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

The Genetic Algorithm

Generate a Generate a PopulationPopulation of random of random ChromosomesChromosomes (Solutions)(Solutions)

Repeat for a number of Repeat for a number of GenerationsGenerations

Cross OverCross Over the current Population the current Population

MutateMutate the current the current PopulationPopulation

Select the Select the FittestFittest for the next Population for the next Population

LoopLoop

The best solution to the problem is the Chromosome inThe best solution to the problem is the Chromosome inthe last generation which has the highest the last generation which has the highest FitnessFitness

“A Search/Optimisation method that solves a problem

through maintaining and improving a population of

suitable candidate solutions using biological metaphors”

Page 12: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

GAs - Chromosome Example

X

0-1270000000-1111111

Y

0-3100000-11111

0000000.00000-1111111.11111

Page 13: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

GAs - Mutation

Each Bit (gene) of a Chromosome is Given Each Bit (gene) of a Chromosome is Given a Chance MP of invertinga Chance MP of inverting

A ‘1’ becomes a ‘0’, and a ‘0’ becomes a 1’A ‘1’ becomes a ‘0’, and a ‘0’ becomes a 1’

01101101

These Ones!

00101111

Page 14: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

GAs - Crossover (2)

01011101 11101010AA BB

X=4X=4

01011010

CC DD11101101

Page 15: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

VARGA - Representation

Chromosome

a111 … …

… a1ij …

… … a1KK

A1 A2 Am Ap

... ...a211 … …

… a2ij …

… … a1KK

am11 … …

… amij …

… … amKK

ap11 … …

… apij …

… … apKK

Page 16: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

VARGA - The Genetic Algorithm GA With Extra Mutation Order Mutation After Gene Mutation Parents and Children Mutate (Both) Genes are Bound Natural Numbers Fitness is -ve Forecast Error Minimisation Problem - Roulette Wheel Run for EACH Patient

Page 17: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Evaluation - Methods for Comparison

SPlus: Yule Walker Equations, AIC and Whittles Recursion, NK(P+1), Standard Package Holt-Winters Univariate Forecasting Method, Is the Data Univariate? (GA Solution) Pure Noise Model, VAR(0), Worst Case Forecast, (Non-Differenced = 0) 54 out of the Possible 82 Patients VF Records Could not be Used : SPlus Implementation

Page 18: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Results - Graph Comparison

Scores for Cases 0 to 6

0

500

1000

1500

2000

0 1 2 3 4 5 6

Case Number

Score

HW

S-Plus

VARGA

Noise

The Lower the Score - the Better Score is the One Step Ahead Forecast Error

Page 19: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Results - Table Summary

Average = The Average One Step Forecast ErrorFor the 28 Patients (Both GA’s Fitness)

(The Lower - The Better)

Method Order(number of order)

AverageScore

VARGA 26 of 1, 2 of 2 559.82S-Plus 12 of 0, 14 of 1, 1 of 2, 1 of 3 616.12HW N/A 683.79

Noise 28 of 0 816.53

Page 20: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Conclusion - Results

VARGA Has a Better Performance VARGA Can Model Short MTS The Visual Field Data is Definitely Multivariate Data Has a High Proportion of Noise

Page 21: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Conclusion - Remarks

Non-Linear Methods and Transformations Performance Enhancements for the GA Improve Crossover Irregularly Spaced Methods Space-Time Series Methods Time Dependant Relationships Between Variables

Page 22: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Generating Explanations in MTS

Useful to know probable explanations for a Useful to know probable explanations for a given set of observations within a time series given set of observations within a time series

E.g. Oil Refinery: ‘Why a temperature has E.g. Oil Refinery: ‘Why a temperature has become high whilst a pressure has fallen below become high whilst a pressure has fallen below a certain value?’a certain value?’

Possible paradigm which facilitates Explanation Possible paradigm which facilitates Explanation is the Bayesian Networkis the Bayesian Network

Evolutionary Methods to learn BNsEvolutionary Methods to learn BNs Extend work to Dynamic Bayesian NetworksExtend work to Dynamic Bayesian Networks

Page 23: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Dynamic Bayesian Networks Static BNs repeated over t time slicesStatic BNs repeated over t time slices Contemporaneous / Non-Contemporaneous LinksContemporaneous / Non-Contemporaneous Links Used for Prediction / Diagnosis within dynamic Used for Prediction / Diagnosis within dynamic

systemssystems

n

iiin XPXXP

11 )|()...(

Page 24: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Assume all variables take at least one time slice to Assume all variables take at least one time slice to impose an effect on another.impose an effect on another.

The more frequently a system generates data, the The more frequently a system generates data, the more likely this will be true.more likely this will be true.

Contemporaneous Links can be excluded from the Contemporaneous Links can be excluded from the DBNDBN

Each variable at time, t, will be considered Each variable at time, t, will be considered independent of one anotherindependent of one another

Assumptions - 1

Page 25: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Representation P pairs of the form (ParentVar, TimeLag)P pairs of the form (ParentVar, TimeLag) Each pair represents a link from a node at a previous time Each pair represents a link from a node at a previous time

slice to the node in question at time t.slice to the node in question at time t.

Examples :Variable 1: { (1,1); (2,2); (0,3)}Variable 4: { (4,1); (2,5)}

Page 26: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Search Space

Given the first assumption and proposed Given the first assumption and proposed representation the Search Space for each representation the Search Space for each variable will be:variable will be:

MaxLagN2

Page 27: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Structure Search : Evolutionary Algorithms, Hill Climbing etc.

Parameter Calculation given structure

Dynamic Bayesian Network Library for Different Operating States

MultivariateTime Series

Explanation Algorithm (e.g. using Stochastic

Simulation)User

Algorithm

Page 28: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Generating Synthetic Data

(1)

(2)

Page 29: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Oil Refinery Data

Data recorded every minuteData recorded every minute Hundreds of variablesHundreds of variables Selected 11 interrelated variablesSelected 11 interrelated variables Discretised each variable into k statesDiscretised each variable into k states Large Time Lags (up to 120 minutes between Large Time Lags (up to 120 minutes between

some variables)some variables) Different Operating StatesDifferent Operating States

Page 30: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

ResultsSOT

FF

TGF

TT

RinT

Page 31: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Explanations - using Stochastic Simulation

Page 32: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Explanations - using Stochastic Simulation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97time-x

P(y

=1)

SOF-SPSOTTTBPF-SPBPF

Page 33: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Explanation

Input (observations):

t - 0 : Tail Gas Flow in_state 0t - 3 : Reboiler Temperature in_state 1

Output (explanation):

t - 7 : Top Temperature in_state 0 with probability=0.92t - 54 : Feed Rate in_state 1 with probability=0.71t - 75 : Reactor Temperature in_state 0 with probability=0.65

Page 34: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Future Work

Exploring the use of different searches and metricsExploring the use of different searches and metrics Improving accuracy Improving accuracy

(e.g. different discretisation policies, continuous (e.g. different discretisation policies, continuous DBNs)DBNs)

Using the library of DBNs in order to quickly Using the library of DBNs in order to quickly classify the current state of a systemclassify the current state of a system

Automatically Detecting Changing Dependency Automatically Detecting Changing Dependency StructureStructure

Page 35: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Acknowledgements

BBSRCBP-AMOCOBritish Council for Prevention of BlindnessEPSRCHoneywell Hi-Spec SolutionsHoneywell Technology CenterInstitute of OpthalmologyMoorfields Eye HospitalMRC

Page 36: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Intelligent Data Analysis

X LiuX LiuDepartment of Computer ScienceDepartment of Computer Science

Birkbeck CollegeBirkbeck College

University of LondonUniversity of London

Page 37: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Intelligent Data Analysis

An interdisciplinary study concerned with An interdisciplinary study concerned with effective analysis of dataeffective analysis of data

Intelligent application of data analytic Intelligent application of data analytic toolstools

Application of “intelligent” data analytic Application of “intelligent” data analytic toolstools

Page 38: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

IDA Requires

Careful thinking at every stage of an Careful thinking at every stage of an analysis process (strategic aspects)analysis process (strategic aspects)

Intelligent application of relevant domain Intelligent application of relevant domain knowledgeknowledge

Assessment and selection of appropriate Assessment and selection of appropriate analysis methodsanalysis methods

Page 39: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

IDA Conferences

IDA-95, Baden-BadenIDA-95, Baden-Baden IDA-97, LondonIDA-97, London IDA-99, AmsterdamIDA-99, Amsterdam IDA-2001, LisbonIDA-2001, Lisbon

Page 40: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

IDA in Medicine and Pharmacology

IDAMAP-96, BudapestIDAMAP-96, Budapest IDAMAP-97, NagoyaIDAMAP-97, Nagoya IDAMAP-98, BrightonIDAMAP-98, Brighton IDAMAP-99, Washington DCIDAMAP-99, Washington DC IDAMAP-2000, BerlinIDAMAP-2000, Berlin

Page 41: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Other IDA Activities

IDA Journal (Elsevier 1997)IDA Journal (Elsevier 1997) Journal Special Issues (1997 -)Journal Special Issues (1997 -) Introductory Books (Springer 1999)Introductory Books (Springer 1999) The Dagstuhl Seminar (Germany 2000)The Dagstuhl Seminar (Germany 2000) European Summer School (Italy 2000)European Summer School (Italy 2000) Special Sessions at ConferencesSpecial Sessions at Conferences

Page 42: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Concluding Remarks

Strategies for data analysis and miningStrategies for data analysis and mining Strategies for human-computer Strategies for human-computer

collaboration in IDAcollaboration in IDA Principles for exploring and analysing “big Principles for exploring and analysing “big

data”data” Benchmarking interesting real-world data-Benchmarking interesting real-world data-

sets as well as computational methodssets as well as computational methods A long term interdisciplinary effortA long term interdisciplinary effort

Page 43: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

The Screening Architecture

Page 44: Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Results from a GP Clinic