making time: pseudo time-series for the temporal analysis of cross-section data emma peeling, allan...
TRANSCRIPT
![Page 1: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/1.jpg)
Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data
Emma Peeling, Allan TuckerCentre for Intelligent Data AnalysisBrunel UniversityWest London
![Page 2: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/2.jpg)
Cross-Section Data Studies often involve data sampled from a cross-section of a population Especially in biological and medical studies
Collecting medical information on patients suffering from a particular disease and controls (healthy)
Essentially these studies show a “snapshot” of the disease process
![Page 3: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/3.jpg)
Cross-Section Data Many processes are inherently temporal in nature Previously healthy people can develop a disease over time going through different stages of severity If we want to model the development of such processes, usually require longitudinal data
![Page 4: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/4.jpg)
Longitudinal Study
Cross-Section vs Longitudinal
Onset
Cross SectionStudy
![Page 5: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/5.jpg)
Pseudo Time-Series Models In this presentation we explore:
Ordering data based upon Minimum Spanning Trees & PQ-Trees (Rifkin et al. 2000)
Treating this ordered data as “Pseudo Time-Series”
Using Pseudo Time-Series to build temporal models
Test using a dynamic Bayesian network model for classifying:
Medical Data Gene Expression Data
![Page 6: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/6.jpg)
Multi-Dimensional Scaling
Can be used to visualise distance between data points and pathways Here we use classic MDS
Metric-based – Euclidean Distance
![Page 7: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/7.jpg)
Minimum Spanning Tree Connects all nodes in graph Links contain minimal weights
Weighted Graph MST
![Page 8: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/8.jpg)
PQ-Tree PQ-Trees are used to encode partial orderings on variables
P nodes: children can be in any order Q nodes: children order can only be reversed
![Page 9: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/9.jpg)
Dynamic Bayesian Network Classifiers
DBNCs are used to calculate: P(C|Xt, Xt-1)
Here, we use the DBNC to model the Pseudo Time-Series for classifying data
![Page 10: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/10.jpg)
Pseudo Time-Series Models In Summary:
1: Input: Cross-section data2: Construct weighted graph and MST3: Construct PQ tree from MST4: Derive Pseudo Time-Series from PQ-tree
using hill-climb search on P-nodes tominimise sequence length
5: Build DBNC model using pseudo temporal ordering of samples
6: Output: Temporal model of cross-section data
![Page 11: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/11.jpg)
The Datasets
B-Cell Microarray Data 3 classes of B-Cell data A number of patients Pre-ordered into expert pseudo time-series
Visual Field Test Data One large cross-section study Healthy and Glaucomatous eyes One longitudinal study for testing the
models
![Page 12: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/12.jpg)
B-Cell: MDS & Pseudo Time-Series
Plots show discovered path in 3D Classification of B-Cell data in 2D
![Page 13: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/13.jpg)
B-Cell Accuracy Plot shows mean accuracy and variance over Cross-Validation with repeats
![Page 14: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/14.jpg)
Expert KnowledgeOrdering Sequence length
Biologist = 512.0506:1-26
PQ-tree: = 528.9907:1-6,7,9,8,11,10,12-18,26,19,21,20,22-25
PQ-tree and hill-climb = 521.1865:1-18,26,19-25
![Page 15: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/15.jpg)
Visual Field: MDS & Pseudo Time-Series
Plots show Path found for VF data in 3D Classification of VF data in 2D
![Page 16: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/16.jpg)
VF Accuracy Plot shows mean accuracy and variance over Train / Test data with repeats
![Page 17: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/17.jpg)
Related Work Semi-Supervised Methods
Some datapoints are labelled with classes
These are used to assist classification of others in an incremental manner
Pseudo MTS imposes an order on the data as well as a distance between data Allows for the prediction of future states
![Page 18: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/18.jpg)
Conclusions Cross Section data usually models snapshot of a process Longitudinal data usually needed to model temporal nature Here we use ordering methods to create Pseudo Time-Series models Early results on medical and biological data are promising
![Page 19: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/19.jpg)
Future Work
Dealing with outliers in dataspace Multiple trajectories (e.g. in VF data) Normalisation (rather than discretisation) Combining a number of longitudinal and cross-section studies
![Page 20: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/20.jpg)
Multiple Trajectories
![Page 21: Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel](https://reader033.vdocuments.us/reader033/viewer/2022050909/5697bf731a28abf838c7f061/html5/thumbnails/21.jpg)
Acknowledgements Thanks to:
David Garway-Heath, Moorifield’s Eye Hospital, London
Paul Kellam, University College London