1 grouping multivariate time series variables: applications to chemical process and visual field...
Post on 19-Dec-2015
222 views
TRANSCRIPT
1
Grouping Multivariate Time Grouping Multivariate Time Series Variables: Applications Series Variables: Applications
to Chemical Process and to Chemical Process and Visual Field DataVisual Field Data
Allan TuckerAllan Tucker - Birkbeck - Birkbeck CollegeCollege
Stephen SwiftStephen Swift - Brunel - Brunel UniversityUniversity
Nigel MartinNigel Martin - Birkbeck - Birkbeck CollegeCollege
Xiaohui LiuXiaohui Liu - Brunel - Brunel UniversityUniversity
2
IntroductionIntroduction
Present a methodology to group Present a methodology to group Multivariate Time Series (MTS) variablesMultivariate Time Series (MTS) variables
MTS is a series of observations recorded MTS is a series of observations recorded over timeover time
Test on two real-world applicationsTest on two real-world applications Grouping - partitioning a set of objects Grouping - partitioning a set of objects
into a number of mutually exclusive into a number of mutually exclusive subsetssubsets
Many, if not all, are NP-HardMany, if not all, are NP-Hard
3
MTS ExampleMTS Example
0
5
10
15
20
25
1 101 201 301 401 501 601 701 801 901
Time
Magnitude
0
5
10
15
20
25
30
35
40
45
4
Grouping MTS - Grouping MTS - IntroductionIntroduction
Desirable to model MTS as a group Desirable to model MTS as a group of several smaller dimensional MTSof several smaller dimensional MTS
Decompose MTS into several Decompose MTS into several smaller dimensional MTS based on smaller dimensional MTS based on dependencies in datadependencies in data
Large number of dependencies Large number of dependencies because one variable may affect because one variable may affect another after a certain another after a certain time lagtime lag
5
Grouping MTS - Grouping MTS - MethodologyMethodology
One High Dimensional
MTS (X)
1. Correlation Search (EP)
2. GroupingAlgorithm (GGA)
Several Lower Dimensional
MTS
(xa, xb, lag)(xc, xd, lag)
...(xe, xf, lag)
1122......QQlenlen
GG{{0,3}{1,4,5}
{2}
6
Correlation SearchCorrelation Search
Spearman’s Rank Correlation usedSpearman’s Rank Correlation used Entire Search Space is too largeEntire Search Space is too large Invalid Triples:Invalid Triples:
• AutocorrelationsAutocorrelations• duplicates irrespective of direction where duplicates irrespective of direction where
laglag = 0 e.g. = 0 e.g. (x(xi i ,x,xj j ,0),0) and and (x(xj j ,x,xi i ,0) ,0)
Evolutionary Programming approach Evolutionary Programming approach found to be the most efficientfound to be the most efficient
7
Grouping Genetic Grouping Genetic AlgorithmAlgorithm- Representation and - Representation and OperatorsOperators
Previously compared and contrasted Previously compared and contrasted different GA representations and different GA representations and operatorsoperators
Falkenauer’s Crossover & Mutation Falkenauer’s Crossover & Mutation ensure Schema Theory holds for ensure Schema Theory holds for grouping problemsgrouping problems
0 3 4 1 2 6 5 7
Group 0Group 0 Group 1Group 1 Group 2Group 2
Chromosome: 0 1 1 0 0 2 1 2 : 0 1 2Chromosome: 0 1 1 0 0 2 1 2 : 0 1 2
8
GroupingGrouping- The Grouping Metric - The Grouping Metric PropertiesProperties
If If QQ is empty, then fitness maximised when is empty, then fitness maximised when each variable is in a separate groupeach variable is in a separate group
If If Q Q contains all pairings of variables (the contains all pairings of variables (the entire search space), then fitness entire search space), then fitness maximised when all variables in the same maximised when all variables in the same groupgroup
If data is from mixed set of MTS, fitness If data is from mixed set of MTS, fitness maximised when variables in the same maximised when variables in the same group have as many correlations as possible group have as many correlations as possible in in QQ and variables in different groups have and variables in different groups have as few correlations as possible in as few correlations as possible in QQ
9
Oil Refinery DataOil Refinery Data
Oil Refinery Process in ScotlandOil Refinery Process in Scotland Data recorded every minuteData recorded every minute Hundreds of variables Hundreds of variables Years of data available on repositoryYears of data available on repository Selected 50 interrelated variables over Selected 50 interrelated variables over
10000 time points10000 time points Large Time Lags (up to 120 minutes Large Time Lags (up to 120 minutes
between some variables)between some variables)
10
Visual Field DataVisual Field Data
The interval between testsThe interval between testsis about 6 monthsis about 6 months
Typically, 76 pointsTypically, 76 pointsare measuredare measured
The number of tests canThe number of tests canrange between 10 and 44range between 10 and 44
B
Nerve Fibre BundleNerve Fibre Bundle(Right Eye)(Right Eye)
Usual Position of Usual Position of Blind Spot (Right Eye)Blind Spot (Right Eye)
X
Values Range BetweenValues Range Between60 =very good, 0 = blind60 =very good, 0 = blind
5 6 6 65 5
56 6 7
5 5 54 4 4
5 6 7 73 2
5
2 4 6 7 84 3 3 2 2 1 1 B 8 81314141515 1 1 B 9 9131313141515131110 9
1212121212111010121212111110
12111111
11
Oil Refinery Data - Results Oil Refinery Data - Results (1)(1)
Very rapid generation of Groups Very rapid generation of Groups (seconds)(seconds)
3 major groups discovered, 2 relating to 3 major groups discovered, 2 relating to the upper and lower trays of the columnthe upper and lower trays of the column
Most of the single variables appear Most of the single variables appear noisynoisy
Used as a method for pre-processing Used as a method for pre-processing data before model building where time data before model building where time is shortis short
12
Oil Refinery Data - Results Oil Refinery Data - Results (2)(2)
A 2 ABSORB REFLUX TRAY-1 H 28 GAS FLOW TO ABSORB
B 17 ABSORB TAIL-GAS H2 CHROM H 30 F8 I/STAGE DRUM LEVELB 27 M/FRACT TOP REFLUX H 33 ABSORB SPONGE OIL TRAY11
C 22 DE-PROP FEED H 34 M/F TOP REFLUX PRESS CTRL
D 25 WASH WATER H 35 DEBUT DIF PRESS TRAY1/19
E 32 J17-COMP SUCTN. PRESSURE H 38 J17-COMP SPEED
F 40 ABSRB STRIPPER BOTTOM H 41 C11/3 INLET
G 4 ABSORB TAIL-GAS H 42 J17 SUCTN.G 24 C3/C4 EX CDU3 H 43 J17 I/STAGEG 36 AUTO/MAN STN TO GAS MAIN H 44 J17 DISCH
G 37 AUTO/MAN STN TO GIRBOTOL I 7 ABSORB PRESSURE CONTROL
H 0 FRESH FEED A-PASS I 11 ABSORB STRIPPER O/HDSH 1 FRESH FEED B-PASS I 13 ABSRB STRIP RBOIL OUTLETH 3 DEBUT FEED EX ABSORB I 14 E4 OVERHEADS - C3H 5 ABSORB REFLUX TO TRAY-13 I 15 ABSORB TAIL-GAS PCT C3H 6 ABSORB STRIPR WATER LVL I 16 ABSORB T/GAS METH CHROMH 8 REACTOR INLET A I 19 ABSORB. H2 METHANE RATIOH 9 REACTOR INLET B I 29 ABSORB BASE LEVELH 10 SPONGE OIL I 31 ABS/STRP TRAY-10H 12 ABSRB LEAN-OIL TO TRAY11 I 39 ABSORB STRIPPER TRAY-6H 18 DEBUT O/HDS PCT C2 I 45 M/FRACT TOP REFLUX D/OFFH 20 DEBUT OVERHEADS - C2 I 46 M/FRACT TOP TO C06H 21 F8 H/CARBON TO ABSORB I 47 ABSORB STRIPPER FEEDH 23 PROPENE PRODUCT EX J102 I 48 ABSORB STRIPPER TRAY-36H 26 REFRIDGE A201 TOTAL FEED I 49 ABSRB STRIP RBOIL OUTLET
13
Visual Field Data - Results Visual Field Data - Results (1)(1)- Patient Group - Patient Group ComparisonComparison
Patients are ordered Patients are ordered on Average on Average SensitivitySensitivity
Patient 1 - lowest and Patient 1 - lowest and Patient 82 - the Patient 82 - the highesthighest
Graph goes from light Graph goes from light (BRHC) to dark (TLHC)(BRHC) to dark (TLHC)
14
Visual Field Data - Results Visual Field Data - Results (2)(2)
High Sensitivity implies similar groupsHigh Sensitivity implies similar groups• Small groups in generalSmall groups in general• Points in the eye will be associated with Points in the eye will be associated with
similar nerve fibre bundlessimilar nerve fibre bundles Low Sensitivity implies dissimilar Low Sensitivity implies dissimilar
groupsgroups• Large groups in generalLarge groups in general• Different areas of the visual field may be Different areas of the visual field may be
deterioratingdeteriorating
15
ConclusionsConclusions
Decomposing Large, High-Dimensional Decomposing Large, High-Dimensional MTS is a challenging oneMTS is a challenging one
Proposed methodology very encouragingProposed methodology very encouraging Oil Refinery Data : 3 relatively Oil Refinery Data : 3 relatively
independent sub-systems rapidly independent sub-systems rapidly identifiedidentified
Visual Field Data : Discovered groups Visual Field Data : Discovered groups offer ideal starting point for modelling as offer ideal starting point for modelling as a VAR processa VAR process
16
Future WorkFuture Work
Experimenting with new datasets Experimenting with new datasets • Gene Expression DataGene Expression Data• EEG DataEEG Data
Determining the ideal Parameters Determining the ideal Parameters • e.g. e.g. QQlen len is very influential on final is very influential on final
groupingsgroupings Combining the two stages - Combining the two stages -
correlation search and grouping into correlation search and grouping into one incremental processone incremental process
17
AcknowledgementsAcknowledgements
Engineering and Physical Sciences Engineering and Physical Sciences Research Council, UKResearch Council, UK
Moorfields Eye Hospital, UKMoorfields Eye Hospital, UK Honeywell Technology Centre, USAHoneywell Technology Centre, USA Honeywell Hi-Spec Solutions, UKHoneywell Hi-Spec Solutions, UK BP-Amoco, UKBP-Amoco, UK