development of a biological data driven modelling and ... · figure 2 digitisation process: (a)...

1
Development of a biological data driven modelling and comparison framework Sam Benkwitz-Bedford, Jean-Bapste Cazier Centre for Computaonal biology, Cancer and Genomic Science Instute, University of Birmingham Problem: Given new video data, how can one best extrapolate and test the behaviours and rules observed within to effecvely further understanding of the observed environment. Transferral is inexact Quantavely comparing a large populaon over me is computaonally intense It is easy to generate bias Chosen Approach (Fig 1): Construct a framework to enable ongoing definion and manipulaon of models Use a range of suitable modelling techniques for comparison of suitability and applicability Cellular Automata (CA) for large populaons Agent Based Models (ABM) for spaal flexibility and behavioural representaon Roboc models for more dynamic environmental manipulaon and realisc failure cases Use the same framework to input direct trajectory data to digise video’s as a stac sub model Compare digised data, and models via analysis libraries within the framework and redefine Crowd source comparison between digised, model and original data and redefine Once suitable confidence is reached compare a subset via more computaonally intense techniques and draw quantave conclusions Figure 2 Digisaon process: (A) Several similar videos of ‘2D me-lapse microscopy image sequences of pc9 non-small lung cancer cells, incubated at 37 o c in 5% co2/air in a humidified chamber’ [3] were used as the inial target for model develop- ment. Aſter being processed trajectories are mapped onto substutes within the model representaon libraries ( B) (the small white dots). Amongst more general analycal data their movement over me is also tracked and finally displayed ( C). Colour intensity is directly representave of movement density in that posion over me relave to either the highest value overall or a chosen set value. In this case (B) and (C) use the same set top end value to show change over me. Data input/representaon: In the case of digisaon (Fig 2): A data video is broken down into individual frames and processed into ImageJ[1] as a set of images. The MosaicSuite[2] sub module is then used to idenfy individuals, linking their movements across me. Output as a series of individuals and their posions this data can then be used as input for the model. Posions are extrapolated into trajectories and speed at any given me point. Digised video data is therefore the iteraon through the moment to moment movement profile of a populaon with the same data gathering hooks as a derived model, easing comparison. In the case of a model derived via observaon, comparison and modificaon ( Fig3,4,5): At each me step enes have a chance of turning (modifying their vector) and then travel forward, environmental factors such as other enes, aracve Bezier curves and hard boundaries then act up on them. With highly controllable heterogeneous model development (Fig3), populaon and environmental manipulaon (Fig 4)can be used to define, modify and observe enes over a chosen me scale. Observaons can then be re-interpolated into a new definion of the model (Fig 5). Heterogeneous Model Development Figure 3 Heterogeneous Bezier curves: Within the model these curves can be designated in varying size, quanty and strength. The strength of retenon and deflecon can also be set to vary from point to point to a fine degree (currently 2px * 2px in 1500 *1500). (A) Several wide deflecve curves with medium strength can be applied, movement is most intense when in a small area between two and generally remains outside of them. (B) More numerous very thin reflecve curves create some small pockets of almost complete exclusion or intense enty retenon. (C) Aracve curves can also be used to deplete relave general movement by drawing enes within them. Figure 4 Populaon and environmental variance: (A) A single core value can be aributed to a group of individuals with a rate of variance. This might be used to create a populaon with core size of 10 and variaon of 9 ranging from 1-19 pixels in length. Several sub populaons with different set values can also be created, as such a user might instead create one with a range of 1-10 and another of 10-19. (B) Environmental effectors like restricve boundaries can also be applied to compress or manipulate a populaon. (C) By applying and removing such meta effectors changing environmental condions can be ap- proximated and observed in an analysis run or triggered manually in a live run. Figure 5 Replicaon of results via model redefinion: The results from digisaon of the cellular data video (Fig2 C) show strand like perturbaons. It was suggested that this may be due to construcon of the growth medium creang paths of least resistance. A simple set of test cases was then applied. (A) Low aracon Bezier curves were combined with replicave enes and a random inial distribuon of both. (B) The curves were then removed and a simple implementaon of cell built paths of least resistance applied. When aempng to turn; enes consider the strength of previous passage with an increased chance of modifying vector in that direcon. (C) Both cell based and environmentally based implementaons of reduced resistance can then be applied in conjuncon . While qualitave comparison and iteraon can be performed; the breadth of possible combinaons creates an exponenal search space. Future work is directed towards narrowing this space and improving the comparison process. Future work Computaonal comparison for candidate thinning before and aſter crowd sourced comparison: Computaonal complexity and accuracy usually scales directly with the number of comparison points. When inially thinning, a quick and general Evoluonary opmisaon algorithm can be applied with a fit- ness funcon comparing ‘simple’ points such as average turn rate. More in depth ‘costly’ analysis like topological paern matching can then be performed upon a sub set. Crowd sourced comparison: Once reduced down to a smaller subset crowd sourcing can be applied to generate quantate metrics for relave similarity. Scores given are retained as a set of session specific readings, this allows ’user confidence’ to be relave to the other comparisons made by a user in a relavely short me not a set criterion. Overall score can be measured as a comparison of separate user’s relave score sets. Comparisons judged as similar should sll stand out as having a higher ’confidence’ rate without the need to train and set specific criterion. Please feel free to start comparing, the website is accessible while on the UOB network from: hp://172.31.10.103/dbCompare/Comparison.jsp Figure 1 Framework overview: Beginning with the digisaon of data videos and definion of a model, the framework com- pares original, digised and model derived data. Ulising both quantave analycal and qualitave crowd sourced meth- ods to define a simulacrum. Insight can be gained in both the applicaon of rules required for similar development and also predicve modelling via manipulaon. Populaon and environmental manipulaon [1] ImageJ. hps://imagej.net/Welcome [2] MosaicSuite. hp://mosaic.mpi-cbg.de [3] Jacopo Credi. Collecve behaviour and sgmergy in populaons of cancer cells, 2015.

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of a biological data driven modelling and ... · Figure 2 Digitisation process: (A) Several similar videos of ‘2D time-lapse microscopy image sequences of pc9 non-small

Development of a biological data driven modelling

and comparison framework

Sam Benkwitz-Bedford, Jean-Baptiste Cazier Centre for Computational biology, Cancer and Genomic Science Institute, University of Birmingham

Problem:

Given new video data, how can one best extrapolate and test the behaviours and rules observed within to

effectively further understanding of the observed environment.

Transferral is inexact

Quantitatively comparing a large population over time is computationally intense

It is easy to generate bias

Chosen Approach (Fig 1):

Construct a framework to enable ongoing definition and manipulation of models

Use a range of suitable modelling techniques for comparison of suitability and applicability

Cellular Automata (CA) for large populations

Agent Based Models (ABM) for spatial flexibility and behavioural representation

Robotic models for more dynamic environmental manipulation and realistic failure cases

Use the same framework to input direct trajectory data to digitise video’s as a static sub model

Compare digitised data, and models via analysis libraries within the framework and redefine

Crowd source comparison between digitised, model and original data and redefine

Once suitable confidence is reached compare a subset via more computationally intense techniques and

draw quantitative conclusions

Figure 2 Digitisation process: (A) Several similar videos of ‘2D time-lapse microscopy image sequences of pc9 non-small lung cancer cells, incubated at 37oc in 5% co2/air in a humidified chamber’ [3] were used as the initial target for model develop-ment. After being processed trajectories are mapped onto substitutes within the model representation libraries (B) (the small white dots). Amongst more general analytical data their movement over time is also tracked and finally displayed (C). Colour intensity is directly representative of movement density in that position over time relative to either the highest value overall or a chosen set value. In this case (B) and (C) use the same set top end value to show change over time.

Data input/representation:

In the case of digitisation (Fig 2):

A data video is broken down into individual frames and processed into ImageJ[1] as a set of images.

The MosaicSuite[2] sub module is then used to identify individuals, linking their movements across

time.

Output as a series of individuals and their positions this data can then be used as input for the model.

Positions are extrapolated into trajectories and speed at any given time point.

Digitised video data is therefore the iteration through the moment to moment movement profile of a

population with the same data gathering hooks as a derived model, easing comparison.

In the case of a model derived via observation, comparison and modification (Fig3,4,5):

At each time step entities have a chance of turning (modifying their vector) and then travel forward,

environmental factors such as other entities, attractive Bezier curves and hard boundaries then act up

on them.

With highly controllable heterogeneous model development (Fig3), population and environmental

manipulation (Fig 4)can be used to define, modify and observe entities over a chosen time scale.

Observations can then be re-interpolated into a new definition of the model (Fig 5).

Heterogeneous Model Development

Figure 3 Heterogeneous Bezier curves: Within the model these curves can be designated in varying size, quantity and

strength. The strength of retention and deflection can also be set to vary from point to point to a fine degree (currently 2px

* 2px in 1500 *1500). (A) Several wide deflective curves with medium strength can be applied, movement is most intense

when in a small area between two and generally remains outside of them. (B) More numerous very thin reflective curves

create some small pockets of almost complete exclusion or intense entity retention. (C) Attractive curves can also be used to

deplete relative general movement by drawing entities within them.

Figure 4 Population and environmental variance: (A) A single core value can be attributed to a group of individuals with a

rate of variance. This might be used to create a population with core size of 10 and variation of 9 ranging from 1-19 pixels in

length. Several sub populations with different set values can also be created, as such a user might instead create one with a

range of 1-10 and another of 10-19. (B) Environmental effectors like restrictive boundaries can also be applied to compress

or manipulate a population. (C) By applying and removing such meta effectors changing environmental conditions can be ap-

proximated and observed in an analysis run or triggered manually in a live run.

Figure 5 Replication of results via model redefinition: The results from digitisation of the cellular data video (Fig2 C) show

strand like perturbations. It was suggested that this may be due to construction of the growth medium creating paths of

least resistance. A simple set of test cases was then applied. (A) Low attraction Bezier curves were combined with replicative

entities and a random initial distribution of both. (B) The curves were then removed and a simple implementation of cell

built paths of least resistance applied. When attempting to turn; entities consider the strength of previous passage with an

increased chance of modifying vector in that direction. (C) Both cell based and environmentally based implementations of

reduced resistance can then be applied in conjunction . While qualitative comparison and iteration can be performed; the

breadth of possible combinations creates an exponential search space. Future work is directed towards narrowing this space

and improving the comparison process.

Future work

Computational comparison for candidate thinning before and after crowd sourced comparison:

Computational complexity and accuracy usually scales directly with the number of comparison points.

When initially thinning, a quick and general Evolutionary optimisation algorithm can be applied with a fit-

ness function comparing ‘simple’ points such as average turn rate.

More in depth ‘costly’ analysis like topological pattern matching can then be performed upon a sub set.

Crowd sourced comparison:

Once reduced down to a smaller subset crowd sourcing can be applied to generate quantitate metrics for

relative similarity.

Scores given are retained as a set of session specific readings, this allows ’user confidence’ to be relative to

the other comparisons made by a user in a relatively short time not a set criterion.

Overall score can be measured as a comparison of separate user’s relative score sets.

Comparisons judged as similar should still stand out as having a higher ’confidence’ rate without the need

to train and set specific criterion.

Please feel free to start comparing, the website is accessible while on the UOB network from:

http://172.31.10.103/dbCompare/Comparison.jsp

Figure 1 Framework overview: Beginning with the digitisation of data videos and definition of a model, the framework com-

pares original, digitised and model derived data. Utilising both quantitative analytical and qualitative crowd sourced meth-

ods to define a simulacrum. Insight can be gained in both the application of rules required for similar development and also

predictive modelling via manipulation.

Population and environmental manipulation

[1] ImageJ. https://imagej.net/Welcome

[2] MosaicSuite. http://mosaic.mpi-cbg.de

[3] Jacopo Credi. Collective behaviour and stigmergy in populations of cancer cells, 2015.