epidemics forecast ing challenge luca colombo july 4, 2018 · econofisica università degli studi...
Post on 25-Jan-2021
0 Views
Preview:
TRANSCRIPT
-
ECONOFISICA Università degli Studi di Torino
Epidemics Forecasting Challenge Luca Colombo July 4, 2018
Abstract
The RAPIDD ebola forecasting challenge is an innovative work inspired by the West African
Ebola crisis in 2014-2015, involving 16 international academic teams and US government
agencies. The participants were invited to predict 140 epidemiological targets across five
different time points of four synthetic Ebola outbreaks. Here i present the results of a more
modest work, based on the idea proposed by the RAPIDD ebola forecasting challenge
paper: evaluate the performances of a simple model on a synthetic outbreak dataset. Both
the dataset and the model were created in NetLogo, an useful tool for agent based
modelling and simulation. The synthetic outbreak dataset is generated by a complex
meta-populations stochastic SIR/SIS scenario in which the agents diffuse in a scale-free
network and spread the infection. The model is a simple Deterministic SIR evaluated with
different degrees of information about the space structure (the network features), the
interventions of other agents (medics) and the diffusion probability. The goal of this work is
to understand how these informations affect the forecasting and, introducing them
gradually, to compare the results to the scenario’s dataset. The simpler models predicts
poorly the scenario whereas the more complex ones go in the right direction, predicting
correctly the peak timing and amplitude, the outbreak duration and the overall trend.
However the missing informations, like the money-making and spending, the outbreak alert
and so on cause some parameters to be different from the scenario. Introducing a
birth/death mechanism as well as immunization processes would be an interesting step
forward. A key element to improve on is the reliability of the model: removing excessive
randomness in some mechanism would avoid accidents like leaving multiple isolated
clusters or slow or non-existent outbreak’s starts.
-
1. Introduction
The development of computational and mathematical models is crucial to prevent and
control the emerging infectious diseases and to guide intervention strategies. For example
in the 2014-2015 West African Ebola Virus Disease (EVD) epidemic were used a variety of
models to generate real-time predictions on the unfolding of the outbreak and help the
authorities fight the disease.
At the end of the West African Ebola epidemic, in spring 2015, a workshop was organized
by the RAPIDD program led by Fogarty International Center of the National Institutes of
Health (NIH). The aim of the workshop was to analyze and discuss the models used during
the outbreak and find possible improvements in the forecasting accuracy. The participants
decided that the best way to do so was to build a forecasting challenge relying on synthetic
Ebola datasets in a controlled and systematic environment, evaluate the models prediction
performances and how they scale with epidemiological complexity and data availability.
The synthetic epidemiological datasets had been generated using spatially structured,
stochastic, agent-based model at the level of single household that integrates detailed data
on Liberia demography. The model was used to generate four outbreak scenarios with an
increasing level of complexity in terms of epidemiology, layered interventions, data
availability and reporting noise. The goal of the participants was to predict 140 targets in
total across all scenarios and time points.
This work takes inspiration from a paper published recently (The RAPIDD ebola forecasting
challenge: Synthesis and lessons learnt [1]) and proposes to do the same in a much simpler
environment in which we build a base complex scenario, we extract data including
incidencies, final size of the outbreak, peak size, peak timing and we build a new simulation
trying to predict these features. In the new simulation is possible to gradually make
hypotheses on infection rate and recover rate, on the space structure, on the travel
probability and so on. However some features of the base scenario are deliberately
unknown in the model. We will evaluate their influence and if they change the parameters
we look for. The goal of this work is to understand how these informations affect the
forecasting and, introducing them gradually, to compare the results to the scenario’s
dataset.
2
-
NetLogo is used to build the ABM simulations. This environment is useful to graphically see
the structure of the space in which the agents interact and move and the incidence through
time of the infection.
2. Space structure
Scenario
It is used a network structure instead of letting the agents move randomly in the NetLogo
world. This allows to have control on the scenario structure with a mathematical formalized
instrument and be flexible at the same time. In a real scenario the nodes represent the
cities and the links the travel vectors on which agents move between cities. The more a
node is linked the more is important and likely to be a hub in which agents interact, spread
the infection or get cured. On the contrary more isolated nodes have generally low
population and it is difficult to get access to cures. In the scenario the infected people are
spawned in peripheral and isolated nodes (first thing the model doesn’t know about).
How to reproduce a real scenario in a network based structure? In the real world big cities
are way more linked than the average size city and orders of magnitude more than a little
city. So the new nodes are more likely to link to nodes with a bigger degree (already deeply
linked). In the Barabasi-Albert paper [2] is called preferential attachment.
The growing network with the preferential attachment leads to a fundamental property:
the scale invariance. This network are called scaled-free and they are convenient in our
case because they are a not arbitrary structure (decided by the author), always different but
with specific features: flexible and reproducible. (Figure 2.1, 2.3)
2.1 Preferential attachment code. Each
node is more likely to link to a more linked
node.
3
-
In order to take in consideration the eventuality of “shortcuts” of a real world scenario (like
airlines, trains and so on), there is the possibility to set a rewiring probability of the built
network. This feature gives even more flexibility to the model because we can transform
gradually a scale-free network in a random-like network.
Model
Switching from the scenario to the model the same number of nodes are maintained with
identical position. Obviously if one would like to predict the parameters of a real outbreak
he’d know the position of the cities and their infrastructures. However in this case the links
represent the diffusion vectors, which are unknown.
In this work are considered two possible options to rebuild the network: homogeneous
hypothesis and heterogeneous hypothesis.
The first one assumes that a person on a random nodes sees an average number of
possible destinations (nodes) to travel to: average degree. The network is built thinking that
a person is surely likely to travel to nearby nodes. So the node he is on is linked to these
nodes forming a cluster in which almost every node is connected (depends on fixed
average degree). However can be fixed a rewiring probability that creates the shortcuts
existing in a real world scenario. If the rewiring probability isn’t high enough the risk is that
a cluster or more is isolated. Inversely if it is too high, the preferred destinations would be
all the nodes and not the closer ones which is unrealistic. The degree distribution is
substantially Gaussian. (Figure 2.2, 2.4)
The second option is that the network is heterogeneous: a person on a random node sees
a number of possible destinations (nodes) so different from one node another that not
considering the degree distribution would be incorrect.
2.2 Clustering code. Choices are the nodes at the minimum distance
4
-
2. 3 Example of a degree distribution of the nodes of a scale-free network. n_nodes = 150
2.4 Example of the degree distribution of the same network in figure 2.3 with n_nodes = 150,
average-degree = 3 and rewiring probability = 0.30. It is clear how the distribution has its center on 3 with
low variance.
2.5 Example of the degree distribution of the same network in figure 2.3 with n_nodes = 150 and rewiring
probability = 0.30. The distribution is obviously similar to the scale-free one with differences due to the
rewiring probability.
To build the network it is used the degree of each node of the scale-free network. In a real
world scenario we could know the number of people going in or out of the city, but don’t
know their destinations. So the degree fixes the number of links of the node but these are
created with the same criteria of the homogeneous network (closer nearby nodes are
privileged) and rewired with a rewiring probability. (Figure 2.5)
This work tries to point out the differences in the infection spread mechanism and in the
intervention one if the agents diffuse in a scale-free network, in a random network with
homogeneous hypothesis and in a random scale-free-like one with heterogeneous
hypothesis.
5
-
3. Outbreak scenario
3.1 People distribution in the
base scenario (Gaussian)
3.2 People distribution on the
Random Network (Gaussian)
In the scenario and in every model the people are generated randomly distributed on the
nodes with a minimum of 2 people per node. At the start the big cities, nodes with high
degree, have almost the same number of people of the little cities. Their importance is due
to the in/out flow: after 1 tick they tend to have doubled (if not more) the people in them,
resulting in many more interactions than the isolated ones.
One difference from the scenario and a model is that the first generates the infected
people on isolated nodes whereas the second generates them uniformly distributed. This
could result in a delayed outbreak in the first case, especially if the rewiring probability is
low: the infected people could after 1 or 2 ticks not reach the hubs, delaying the infection
spreading. Inversely, in the model, the infected people, being evenly distributed, could
spread the infection too fast if the rewiring probability is too high, creating strong
discrepancies between the model and the scenario.
The model knows only the total number of people and the initial number of infected people
(initial-outbreak-size).
Travel
The virus spreads following the metapopulation model setup. In each node the people
interact with each other and then every person has an average probability to diffuse into
another node (travel-tendency).
There is a feature that the model doesn’t take in consideration: the outbreak mechanism.
6
-
When the overall number of infected people exceeds a so called outbreak-threshold it
triggers an outbreak alert that changes the behaviour of all the agents: the medics and the
agents start to travel whereas the normal people (susceptible, infected, recovered and
cured) travel much more easily (5 times the base travel-tendency).
Lastly there is another feature that the model will not take in consideration: the money
mechanism. In the real life everybody starts with a certain amount of money and works his
way into the society. In the scenario every agents is generated with a random Poissonian
amount of money and every tick, when he doesn’t travel, makes money. In order to travel
he needs to pay a price and so when he doesn’t have the right amount of money he stays in
the node. This creates a micro-delay for those poorer agents that overall changes the
movement speed.
Interaction
In a SIR model susceptible people have the probability the become infected proportionally
to the virus-spread-chance multiplied by the probability to encounter an infected person in
a node and the infected agents have a recovery-chance to become recovered.
3.3 SIR compartment model (epidemiology).
3.4 Number-of-people(time) in a simple
Deterministic SIR Model.
Blue = Susceptible
Green = Infected
Red = Recovered
7
-
The dynamics of an epidemic are often much faster than the dynamics of birth and death,
therefore, birth and death are often omitted in simple compartmental models. The SIR
system without so-called vital dynamics (birth and death, sometimes called demography)
described above can be expressed by the following set of ordinary differential equations.
The beta and gamma coefficients are respectively the transition rates from susceptible to
infected and from infected to recovered, here called virus-spread-chance and
recovery-chance. Without the birth and death dynamics we can see that:
and furthermore:
The scenario created in this work differs from a standard SIR model in many features. First
of all the virus-spread-chance is inversely proportional to the number of susceptible
people. This means that the outbreak is delayed whereas at the peak of the infection (when
the susceptible are less) it is amplified compared to the start of the infection.
8
-
It is used a mixed SIR-SIS model: recovered people are potentially susceptible or cured,
which is the agent state that doesn’t allow to be infected anymore, proportionally to a
variable called recovery-time (tr), randomly generated when the agent recovered. So the
more time the agent stays in the recovered status the more is likely to become cured and
less to become infected again.
In the scenario are generated randomly a number (initial-doctors) of medics M that cures a
random number of infected people on a node with a probability of 50% (pc).
The medics behaviour changes dramatically when another player joins the game: the agent.
At the start of the outbreak a random number (max 10) of agents are created. They travel
freely from node to node and if the number of infected people is too high, notably more
than 30% of the people on the node, it closes the node. This behaviour stops the in/out
flow of infected and prevents the virus spreading.
The medic in a closed node is much more efficient curing the infected. This agent-medic
behavior is something that the model deliberately doesn’t know about.
In the code (3.5) the two-bodies interactions are conducted via nested commands to the
agentset “people” (a command in NetLogo is called ask). The 𝜏 is set to 1 via “not
generated-in-this-loop?”. The medics have a 50% chance to cure a infected person whereas,
while into a closed node, they have the 100% chance to cure. Every recovered has a
recovery-time assigned in the change of status (infected to recovered) Gaussian distributed.
9
-
3.5 Infection, recover and part of the cured code
Scenario
We generate 3000 people evenly distributed on 150 nodes or locations. The network is
scale-free with rewiring probability of 0.10.
The initial-outbreak-size is set to 4, the virus-spread-chance to 60%, the recovery-chance to
10%, the travel-tendency to 0.4, the outbreak threshold to 0.3 and the number of doctors to
a reasonable 30 (1% of the population).
3.6 People distribution at the start
of the simulation.
3.7 Degree distribution of the
scale-free network.
10
-
3.8 People distribution after 3 ticks. Most of the people (at least 10%)
are on the node with degree 30. It is an hub on which hundreds of
interactions take place.
3.9 Network structure. Scale-free with some shortcuts. Pink people are
the susceptible ones.
11
-
3.10 Populations after 115 ticks. Blue = Susceptible, Red = Infected, Light-green = Recovered,
Green = Cured.
The susceptible curve has a logistic-like form as suggested by the previous differential
equation. The Infected one follows the logistic growth, but as the number of infected
reaches the 30% of the total population the outbreak alert is activated. From now on the
agents travel much faster and the agents and the medics joins with a dramatic decrease of
infected people as well as an increase of cured ones. Until here the cured count was near
the recovered one.
The max count of infected is near the 30% of the population (33%) on the 20th tick. From
now on the infected count fluctuate due the micro-interactions in the nodes. Slowly but
surely it decreases until a zero infected situation is reached: 514 ticks in this case. The end
of the outbreak is not reliable because the randomness in the micro interactions here is
more important that the actual macro situation.
3.11 Populations trend from the start to the end of the simulation.
12
-
4. Model
The model chosen to replicate the scenario is the logistic growth of the the basic
deterministic SIR. The initial number of the infected as well as the total number of people
and the number of nodes is know. We will subsequently make 4 hypotheses about the
structure of the network, the pure SIR versus a modified SIS and the travel tendency of the
agents.
No hypothesis
4.1 Degree distribution of the homogeneous random network.
Without any hypothesis the network is random and
homogeneous (4.3): the agents travel much faster than in a
scale-free network and they see, on every node, the same
number of neighbors. This implies that no hub is created
and isolated nodes exists rarely (clusters could be isolated).
The outbreak involves more susceptible at the same time: about 70% of them are infected
at the peak of the infection which is located around the 7th tick. Its end is located around
the 90th tick. So it has a faster and more aggressive initial development but also a faster
end because of the fast growth of the recovered people which can’t be infected again at
this stage of the model.
4.2 Outbreak of a Deterministic SIR in a random homogeneous network. The line in dark red is the trend
of the infected people in the scenario. beta = 0.60, gamma = 0.10.
13
-
4.3 Network view.
Built with the idea
that nodes (cities)
are likely to be
linked to nearby
cities as well as
have some link
with cities far
away. Random
network building
often leaves
clusters of nodes
isolated if the
rewiring
probability is low.
The network is
homogeneous as
you can see by
the size of the
nodes which is
proportional at
their degree (4.1).
Average degree = 4; rewiring probability = 0.05
Network Structure hypothesis
Even if we don’t know the network in its details, it’s reasonable to think that in real scenario
forecasting, one will not know the exact movement of the agents, but he can make an
hypothesis on the main vectors and cities through which they will pass.
In this case the degree of the scenario nodes as well as the position in the world is taken.
14
-
The rewiring probability is fixed to 0.05 in this case but could be needed to set it around
0.10 because of the clustering tendency used.
This hypothesis doesn’t change substantially the trend of the infected people count, but it
generates a wider plateau on the peak of the infection (4.4), caused by the medium-low
mobility that the agents have.
4.4 Trend of the
infection in a
scale-free
network
randomly
rebuilt.
4.5 World
rebuilt with the
hypothesis that
the nodes don’t
have the same
number of
neighbors. Some
of the are hubs
others are
isolated.
15
-
Cured, Doctors and Travel Hypotheses
The outbreak evolves too fast and too widely still. So we need to introduce something that
could delay and contain the infection. Three big hypotheses are made here:
1. A travel probability not equal to 1 which is not realistic.
2. Introducing a SIR/SIS mixing mechanism in which the recovered has a chance to be
infected again as well as be cured totally.
3. Introducing doctors that simply seek the infected people and cure them.
4.6 Travel probability = 0.1; Pure SIR, no doctors.
4.7 Travel probability = 0.4; Pure SIR, no doctors.
The travel probability plays a role in the diffusion and the control of the infected people. In
figures 4.6 and 4.7 it is clear how with a low travel probability the agents change state more
slowly (from susceptible to infected and from infected to recovered). In reverse it is
sufficient a medium travel probability to concentrate a lot of people in one or two nodes so
that the agents are more likely to interact with each other.
16
-
It is introduced the cured/recovered model and the doctors which travel freely on the
network and cure infected people. This produces infected curves much closer to the
scenario one: the outbreak is a bit slower, its peak happens always around the 20th tick
and its end tends to be around the 500th tick. This last information isn’t something to rely
on because of the randomness but it is important the trend of the infection.
4.8 Travel probability = 0.1, cured probability = 0.05, number of doctors = 10
4.9 Travel probability = 0.1, cured probability = 0.05, number of doctors = 10
4.10 Travel probability = 0.1, cured probability = 0.05, number of doctors = 10, delay = 11
17
-
In the scenario the infection is regulated by non-linear effects. These are unknown in the
model and impossible to reproduce if one doesn’t know the differential equation behind it-
A delay was added to reproduce the trend of the infected people in the scenario (4.10).
In figures 4.11, 4.12. and 4.13 it is visible how a model depends on the structure of the
network: with low rewiring probability some clusters can be excessively isolated, causing
4.11 Tick = 31, cured probability = 0.02, number of doctors = 30, delay = 11
4.12 Complete outbreak: cured probability = 0.02, number of doctors = 30, delay = 11
4.13 beta = 0.70, gamma = 0.05, cured probability = 0.01, number of doctors = 30, delay = 9
18
-
the infection to not spread fully and after 100 ticks generating another little outbreak.
In figures 4.14, 4.15 and 4.16 is visible the tweaking process of the parameters. The
rewiring probability went from 0.05 to 0.10, producing less isolated clusters.
4.14 Rewiring probability = 0.10, beta = 0.60, gamma = 0.10, number of doctors = 31,
cured probability = 0.01, delay = 11
.
4.15 Rewiring probability = 0.10, number of doctors = 36, cured probability = 0.02
4.16 Rewiring probability = 0.10, number of doctors = 32, cured probability = 0.01
19
-
5. Conclusions
The scenario was built with a lot of details and mechanisms that the model didn’t know
about: the infected spawn in isolated nodes, the money-making mechanism, the costs of
the travel, the outbreak alert and the doctor-agent interaction.
On the contrary the model was based on a simple logistic growth differential equation
system (Deterministic SIR model). It managed to reproduce the scenario trend of the
infected people with the progressive additions of hypotheses. The network structure, that
changes from homogeneous to heterogeneous, and the low travel probability mainly allow
to set the peak position and width whereas the SIR-SIS mixing and the introduction of
doctors set the its amplitude and the outbreak duration.
The outbreak parameters are correctly provided: the scenario’s virus spread chance and
recovery chance are 60% and 10% respectively whereas the model’s parameters are beta =
0.60 and gamma = 0.10.
The network structure is similar to the scenario’s one. Despite this the money making
process and the travel cost slow down the movement of the agents in the scenario. For this
reasons the model’s travel probability is set to 0.1 whereas the scenario travel tendency
was 0.4 or 1 with the panic due the outbreak alert.
In the scenario the people didn’t change the status from recovered to cured in a linear way,
but it seems that setting a cured probability of the recovered of 0.01 works well. Without
the agent-doctor mechanism the number of doctors needed to achieve the scenario’s
infected trend is a bit higher: from 30 of the scenario to 32 of the model.
The non-linear infection growth of the scenario, the spawn of the infected people on
isolated nodes and especially that model doesn’t have these informations forces a delay to
be introduced (delay = 11).
Lastly it is clear that the multiple model’s features based on randomness make the model
less reliable. This problem could be solved repeating the simulations a lot of times,
providing an error on the simulation ’s results and therefore validating them or introducing
more complex mathematical structures.
20
-
21
-
Bibliography
[1] The RAPIDD ebola forecasting challenge: Syntesis and lessons learnt.
[2] Emergence of scaling in random networks. AL Barabási, R Albert - science, 1999
[3] NetLogo library: Virus on a Network
[4] NetLogo library: Preferential Attachment
22
https://www.sciencedirect.com/science/article/pii/S1755436517301275http://science.sciencemag.org/content/286/5439/509http://ccl.northwestern.edu/netlogo/models/VirusonaNetworkhttp://ccl.northwestern.edu/netlogo/models/PreferentialAttachment
top related