arxiv:1710.00317v1 [q-bio.pe] 1 oct 20171 arxiv:1710.00317v1 [q-bio.pe] 1 oct 2017 2 is not...

29
Malaria intensity in Colombia by regions and populations Alejandro Feged-Rivadeneira 1,2 , Andrs ngel 3 , Felipe Gonzlez-Casabianca 3 , Camilo Rivera 4 1 Department of Anthropology, Stanford University, Stanford, CA, USA 2 Department of Urban Management and Design, Universidad del Rosario, Bogot, Colombia 3 Department of Mathematics, Universidad de los Andes, Bogot, Colombia 4 Walmartlabs, Sunnyvale, CA, USA Abstract Determining the distribution of disease prevalence among heterogeneous popula- tions at the national scale is fundamental for epidemiology and public health. Here, we use a combination of methods (spatial scan statistic, topological data analysis, epidemic profile) to study measurable differences in malaria intensity by regions and populations of Colombia. This study explores three main questions: What are the regions of Colombia where malaria is epidemic? What are the regions and populations in Colombia where malaria is endemic? What associations exist be- tween epidemic outbreaks between regions in Colombia? Plasmodium falciparum is most prevalent in the Pacific Coast, some regions of the Amazon Basin, and some regions of the Magdalena Basin. Plasmodium vivax is the most prevalent para- site in Colombia, particularly in the Northern Amazon Basin, the Caribbean, and municipalities of Sucre, Antioquia and Cordoba. Malaria has been reported to be most common among 15-45 year old men. We find that the age-class suffering high risk of malaria infection ranges from 20 to 30 with an acute peak at 25 years of age. Second, this pattern was not found to be generalizable across Colombian pop- ulations, Indigenous and Afrocolombian populations experience endemic malaria (with household transmission). Third, clusters of epidemic malaria for Plasmodium vivax were detected across Southern Colombia including the Amazon Basin and the Southern Pacific region. Plasmodium falciparum, was is epidemic in 13 of the 1,123 municipalities (1.2%). Some key locations act as bridges between epidemic and endemic regions. Finally, we generate a regional classification based on intensity and synchrony, dividing the country into epidemic areas and bridge areas. 1. Introduction Malaria in Colombia has been studied from a variety of disciplines that describe disease patterns with dimensions such as the diversity of the vector [1, 2], charac- teristics of the parasite [3], social phenomena affecting disease transmission [4, 5], and geological phenomena [6, 7]. Mainly, national and local contexts are well un- derstood for a country that presents unusual diversity of environments and social backgrounds (including vast cultural diversity), which, in turn, represents differ- ent characteristics of malaria transmission. In contrast with Sub-Saharan Africa, where malaria is commonly a deadly disease affecting primarily children, Colombia 1 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017

Upload: others

Post on 12-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

Malaria intensity in Colombia by regions and populations

Alejandro Feged-Rivadeneira1,2, Andrs ngel3, Felipe Gonzlez-Casabianca3, CamiloRivera4

1 Department of Anthropology, Stanford University, Stanford, CA, USA2 Department of Urban Management and Design, Universidad del Rosario, Bogot,Colombia3 Department of Mathematics, Universidad de los Andes, Bogot, Colombia4 Walmartlabs, Sunnyvale, CA, USA

Abstract

Determining the distribution of disease prevalence among heterogeneous popula-tions at the national scale is fundamental for epidemiology and public health. Here,we use a combination of methods (spatial scan statistic, topological data analysis,epidemic profile) to study measurable differences in malaria intensity by regionsand populations of Colombia. This study explores three main questions: Whatare the regions of Colombia where malaria is epidemic? What are the regions andpopulations in Colombia where malaria is endemic? What associations exist be-tween epidemic outbreaks between regions in Colombia? Plasmodium falciparum ismost prevalent in the Pacific Coast, some regions of the Amazon Basin, and someregions of the Magdalena Basin. Plasmodium vivax is the most prevalent para-site in Colombia, particularly in the Northern Amazon Basin, the Caribbean, andmunicipalities of Sucre, Antioquia and Cordoba. Malaria has been reported to bemost common among 15-45 year old men. We find that the age-class suffering highrisk of malaria infection ranges from 20 to 30 with an acute peak at 25 years ofage. Second, this pattern was not found to be generalizable across Colombian pop-ulations, Indigenous and Afrocolombian populations experience endemic malaria(with household transmission). Third, clusters of epidemic malaria for Plasmodiumvivax were detected across Southern Colombia including the Amazon Basin and theSouthern Pacific region. Plasmodium falciparum, was is epidemic in 13 of the 1,123municipalities (1.2%). Some key locations act as bridges between epidemic andendemic regions. Finally, we generate a regional classification based on intensityand synchrony, dividing the country into epidemic areas and bridge areas.

1. Introduction

Malaria in Colombia has been studied from a variety of disciplines that describedisease patterns with dimensions such as the diversity of the vector [1, 2], charac-teristics of the parasite [3], social phenomena affecting disease transmission [4, 5],and geological phenomena [6, 7]. Mainly, national and local contexts are well un-derstood for a country that presents unusual diversity of environments and socialbackgrounds (including vast cultural diversity), which, in turn, represents differ-ent characteristics of malaria transmission. In contrast with Sub-Saharan Africa,where malaria is commonly a deadly disease affecting primarily children, Colombia

1

arX

iv:1

710.

0031

7v1

[q-

bio.

PE]

1 O

ct 2

017

Page 2: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

2

is not considered particularly relevant in malarial disease studies given the rela-tively low mortality when compare with Sub Saharan Africa. However, malaria inColombia presents certain characteristics that resemble those observed in SoutheastAsia. Colombia was one of the first countries where resistance to chloroquine-basedtreatment was reported. Varied malaria intensity among segregated and diversepopulations inhabiting different and unique environments make Colombia one ofthe few cases where malaria is endemic and where disease patterns are inconsistentfrom regionally, in contrast to several countries that follow a consistent patternof infection, or whose segregated vulnerable populations do not differ in their epi-demic patterns [8, 9]. This does not mean that other countries have a homogeneousexperience of malaria intensity across subpopulations or regions. However, diseasedistribution among Colombian populations has caused the parasite to generate re-sistance to treatment, unlike several other countries in the world except for SouthEast Asia.

Malaria is a complex disease, and factors associated to disease severity and re-sistance have been reported, yet genetic resistance to malaria is more understoodthan to any other human disease [10]. However, the strong geographical associationbetween resistance to the pathogen and disease severity remains a major challengeto assess the causality of human genetic resistance [10]. We know from evolution-ary theory that two critical factors for selection must occur: 1) a population withgenetic diversity has to exist for selection to be able to operate; 2) a differentialin reproductive value of the trait in question for adaptation to evolve. BecauseAfrican populations exhibit both genetic diversity and experience severe malaria,genetic resistance to the pathogen appears to have emerged independently in dif-ferent foci [11]. However, unlike Africa, Colombia has no record of human geneticresistance to malaria. On the contrary, the parasite appears to have developed re-sistance to treatment. Until recently, it was unknown whether pathogen resistancewas the result of selection of mutant strains under drug pressure, the spread of resis-tant strains, or adaptation of previously sensitive parasites [12]. Genetic evidencesuggests that resistant malaria emerged in at least 4 different geographical foci, con-sistent with the history of reports of resistant pathogens for Plasmodium falciparumin the Thailand-Cambodia border and Colombia in the 1950s, then spreading fortwo decades to South America, Asia and India, and then to Africa in Kenya andTanzania in the late 1970s [13]. Resistant Plasmodium vivax was first reported inPapua-New Guinea in 1989, it is currently present in South East Asia, and sus-pected to occur in South America [13]. Studies have found resistant Plasmodiumvivax at a rate of 11% in representative samples of all blood smears collected in twoendemic geographical regions in Colombia: Llanos orientales (Eastern Plains) andUrab [14], while others have found no evidence of resistant Plasmodium vivax formsin the Pacific Coast and the Amazon Basin [15]. However, therapeutic failure ratesof Plasmodium falciparum (for representative samples of all blood smears collected)have been reported as high as 78% for these same regions [15], and 67% in Antioquia[16]. More resent assessments of malaria prevalence in endemic areas also suggestthat uncomplicated malaria by low parasitemia is one of the biggest challenges formalaria control strategies [5, 17], and studies indicate that the observed differencesare not attributable to human genetic traits that confer resistance [18].

Few studies have addressed the malarial epidemiology by regions and populationsto explore the role of malaria intensity in the emergence of resistant forms of the

Page 3: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

3

parasite. However, the role of Colombia in the global epidemiological context ofmalaria indicates that the country may present unique characteristics for diseasetransmission. Mainly, the presence, absence, and most importantly, emergence ofresistant forms of the parasite in different regions suggests that isolated and distinctepidemic regions exist within the national boundaries, and such characteristics mayplay a distinctive role in the evolution of the parasite. Here we address the intensityof malaria by regions and human populations in Colombia, and the degree that theepidemic characteristics between regions affect each other.

One key aspect remains poorly understood about malaria dynamics in Colom-bia: how many different epidemic regions exist, and how do subpopulations in theseregions experience malaria. During fieldwork, we interacted with local health offi-cials who conducted malaria prevention programs at both local and national levels.Each public health official had knowledge and expertise about epidemic dynamicsin their specific territorial assignment. However, a lack of systematic approacheshamper the ability to formalize such knowledge. Malaria intensity and the socialaspects that condition the transmission of the parasite drive public health inter-ventions. However, the regional designations are yet to be formalized based onanalysis of malaria dynamics. Decisions about prevention strategies, and how totarget the most vulnerable populations are made based primarily on the expertiseof local health officials, as indicated by the classification by [19, 20, 6]

Here we generate a systematic classification of the malarious regions and sub-populations of Colombia, to characterize locations and subpopulations with epi-demiological aspects of the parasite. Here we address three basic questions con-cerning/surrounding the intensity of malarial infection by ethnicity an region:

(1) Is this population experiencing higher malaria intensity than other regionsof the country?

(2) Is the parasite persisting endemically within this population?(3) Are the epidemic characteristics of this subpopulation affecting other sub-

populations?

Specifically, we examine eight years of malarial case reports, they are examinedfor both malaria intensity, synchrony and segregation by ethnicity. First, we employan outbreak detection algorithm [21] widely used [22, 23, 24] to identify clusters inspace with outbreaks of malaria. Second, we apply methods from Topological DataAnalysis (TDA) [25, 26] to discover synchronous outbreaks: areas that present sim-ilar time patterns of malarial epidemic. Finally, regional case reports are exploredwith descriptive statistics to analyze the intensity of malaria exposure by ethnicity.

2. Background

From John Snow’s seminal study of cholera in London, epidemiology has beena spatial discipline [27]. Geographical disease patterns have been widely describedfor numerous pathogens and regions. We use three methods to analyze malaria inColombia: disease clustering, disease visualization, and ecological analysis.

The production of good quality maps to understand and visualize risk of dis-ease transmission is recognized as one of the fundamental tools for malaria controlstrategies, specifically, understanding the relationship between malaria endemicityand the health impact of malaria [28]. Studies suggest that annual entomologicalinoculation rates (commonly computed as the product of the daily human bitingrate, the sporozoite rates from the caught mosquitoes, and the days per year, 365

Page 4: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

4

[29]) in Ghana (100-1000), Kenya (10-60) and Gambia (less than 10) are associatedto prevention of all cause childhood mortality rates by insecticide treated bed nets,with efficacy of 17%, 33%, and 63%, respectively [28]. These results suggest thatpublic health policies should vary according to malaria endemicity, since bed netshave been the linchpin of malaria prevention strategies since DDT was discontinuedas a viable alternative. Yet, evidence suggests that there are several contexts inwhich bed nets are not efficient [28]. In locations where malaria is intense, the useof bed nets is less efficient to prevent the burden of the disease.

Due to the scarcity of multi-sited studies across different countries, variationof the relationship between endemicity and overall health remains unknown [28].However, within country variation of malaria has been subject of numerous studies.One study that addresses such relationship is produced by [30], using GIS andmalaria case reports to map malaria intensity in Kenya. Their results also questionthe use of treated bed nets in regions where malaria is intense, because in thesecommunities, bednets are the most inefficient [30].

Spatial descriptions of variation in malarial infection within countries has beenaddressed using maps of risk of contracting the disease. For example, [31] produceda more accurate visualization of risk of contracting malaria in Mali, by combiningregression analysis with “krigging” (i.e., an interpolation method similar to smooth-ing fitted values) to account for local responses to environmental conditions such asweather, population and other topographic and sociological features. Using thosemethods, they are able to identify regions where the risk is higher than representedin traditional maps [31]. A similar approach, but based on entomological and de-mographic geo-coded records, is implemented with a GIS analysis to describe localrisk of infection based upon proximity to breeding sites and human populations [32].[33] have implemented a variation of these risk maps by integrating remote sensingdata to identify locations of high transmission based on human-vector interactionfor a region in Mexico, including variation by season.

The applicability of mobile phone data to map human mobility with diseasedynamics does pose some interesting caveats. First, the fraction of the populationwith high degree of mobility remains constant in some studies, but this does notnecessarily mean that it is precisely that fraction of the population who is movingpathogens from one place to another [34]. The Nukak represent one of the mostendemic and vulnerable populations in terms of malaria persistence, and are highlymobile. Furthermore, multi-scale network models of human mobility suggest thatlocal migration plays an important role in the synchronization of epidemics amongsubpopulations [35], and suggests that small populations who are highly mobileplay a fundamental role in the dispersal of epidemics.

Estimating the effect of migration on pathogen loads has been a growing interestof epidemiologists in the past years, and multiple methods have been implementedto address such interaction. A different data-driven approach to examine the ef-fect of human mobility on epidemics has been the gravity model, used to evaluatemeasles outbreaks, both by age-classes and by urban and rural settings [36, 37, 38].The main finding of this approach was that population densities were the maindriver of outbreak seasonality across different environments [36, 37, 38]. Further-more, the same group has used nighttime light imagery to estimate the effect ofchanging patterns of population densities on disease outbreaks [39]. Unfortunately,few comparative studies exist to determine which method is more effective under

Page 5: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

5

which conditions and for which diseases. However, the method of nighttime lightimagery provides good estimates of mobility of populations without access to phoneservices, often the most vulnerable populations in terms of disease prevalence.

Although the gravity model has been mostly used for directly transmitted dis-eases, understanding the effect of human mobility on disease epidemics, and moregenerally how disease disperses over space and time, has been one of the fundamen-tal questions in contemporary spatial epidemiology.

Two approaches have been used to analyze synchrony of disease outbreaks overspace and time, controlling for seasonal and environmental variables. Spectralanalysis has been used to describe the association between aggravation of asthmasymptoms and temperature or atmospheric contamination levels [40, 41], and theassociation between air pollution and mortality [41]. This technique has been usedto study the effect of climatic variation on cholera [42, 43], malarial epidemics [44],and the seasonality of sexually transmitted diseases (STDs) [45].

Some authors suggest that these methods have limitations, because they can onlybe used for time-series data in which statistical proprieties do not change over time,yet, epidemic data are inherently complex and non-stationary [41]. Furthermore,evidence suggests that epidemic data characteristics do change over time [46, 47, 41].

The limitations of the spectral decomposition methods led to the implementationof the second technique that is most widespread in understanding disease dynamicsover space and time, coupled with climatic and environmental conditions from anon-stationary perspective: wavelets, a method used to show how time-series varyas a function of time and space [41, 48].

Wavelet analysis has been used to study geographical hierarchies of measlesepidemics, and the observed effect of vaccination policies over time [49]. Associ-ations between dengue epidemics and El Nino Southern Oscillation (ENSO) havealso been documented using this method [50]. Kreppel[51] found an association be-tween ENSO, Indian Ocean Dipole (IOD) and plague dynamics in Madagascar, and[52] found similar effect of those two climatic phenomena on infectious gastroen-teritis in Japan. [53] documented that Buruli ulcer is affected by short and longrainfall patterns in French Guiana, as well as stochastic events such as ENSO. Therelationship between ENSO and cutaneous leishmaniasis has also been documentedfor Costa Rica [54]. Jose[55] have studied the changing patterns and seasonalityof Australian rotavirus epidemics comparing a multiplicity of methods includingwavelet analysis, and detected seasonal biannual and quinquennial periods, yet, athree year epidemic period was also found to be dominant. Spectral analysis con-firms that serotype harmonics interact in a complex, non-linear fashion, yielding anobservable overall pattern beyond the isolated dynamics of each separate serotype,that is more than the sum of the parts, and inherent dynamics remain unchangedbut the amplitude of disease infection is modified [55].

Spatial analysis methods have been applied in disease cluster identification. Themain approaches used are: K-cluster analysis, detects global clusters based on eachcase point [56]; the geographical machine [57] and the scan statistic [21], work byaggregating cases in different areas and performing a hypothesis test based on aBernoulli null model, with the advantageous difference for the scan statistic thatit can perform multiple tests simultaneously. We present an implementation of thescan statistic in this case study. Small, isolated outbreaks of malaria in specificcommunities have been identified as “discrete mini epidemics”, which represent

Page 6: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

6

disease severity by using space-time cluster identification [58]. Disease risk by ge-ographical location has also been implemented as simple logistic regressions thatinclude altitude, and physical coordinates of each individual within a case-controlstudy [59]. The scan statistic method has been used by [60] to identify diseaseclusters over space and time in a South African region. [61] have also implementedthe scan statistic method in China to identify clusters and suggest public health re-source optimization. Faires[62] used this method to identify clusters of Clostridiumdifficile over time in Ontario, Canada. Duczma[63] implemented the scan statisticto study Chagas’ disease in Brazil, while [64] do the same for end-stage renal dis-ease (ESRD) in northen France. In Virginia, the increasing burden of Lyme diseasewas documented using spatiotemporal scan statistics [65]. In all cases, studies wereable to identify areas with more cases than expected, highlighting in many casesthe relevance of regions that did not present a comparatively higher incidence.

Globally, [66] have used maximum likelihood methods (i.e. a similar approachto the scan statistic) to predict areas where malaria is likely to expand as a resultof climate change.

Topological Data Analysis (TDA) applies ideas from the mathematics area oftopology to study high dimensional data by obtaining invariants and useful repre-sentations of the shape of the data.

In recent years, TDA has been used to find subgroups of individuals with uniquegenetic and prognostic profiles of breast cancer [26, 67], political alliances in thecongress [26], profiles of basketball players [26], pathogen persistence in soil [68],novel patterns in spinal and brain injury [69], subgroups of individuals with differentcomplications from type 2 diabetes [70], among others.

3. Methods

The analysis for this study was generated from case reports based on activeand passive detection methods compiled by the Colombian government. Reportsof malaria cases are mandatory. Treatment is provided for free to each case, anda positive test is required to disburse the medication. Data were accessed by re-questing a user to query the Sispro database of reported cases by the Ministry ofHealth.

All cases are laboratory confirmed and geocoded to the municipality level. Weincluded data for 1,156 municipalities, that range in area from 15 to 65,674 km2;total area sampled was 1,142 million km2. For each municipality, we also analyzedethnic membership, comprising 3,369 different populations. The distribution ofmalaria cases and segmentation of the data are displayed in Fig 1 and Table 1

3.1. Clustering. The two main objectives are to determine if malaria outbreaksexist in Colombia, and, if so, to determine their location. To address these objec-tives, we apply scan statistics to perform a hypothesis test in each municipality,examining whether it presents an outbreak. These approaches have been widelyused in epidemiological studies [21], [71], and [72].

The model to test hypothesis is mainly based on the Bernoulli model of [21]using the r-package Spatial-Epi[73].

Given the data aggregated by municipality for 2007-2015, each record is assignedto the centroid of the municipality. Because the set of possible outbreaks (allpossible aggregations of neighboring municipalities) is almost unlimited in terms ofshape and size, so the step is to approximate this set. In this case, a grid G of N

Page 7: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

7

0

500

1000

1500

2008 2010 2012 2014 2016

Date

Cases (

count)

Type

Plasmodium falciparum

Plasmodium vivax

Malaria totals by weeks in Colombia

Figure 1. Distribution of registered malaria cases in Colombia,between the years 2007 and 2015

Table 1. Summary and segmentation of registered malaria casesin Colombia between the years 2007 and 2015

Attribute Total Cases Percentage (%)

SexM 228075 63.32F 132075 36.67

EthnicityAfro 110333 30.63

Indigenous 38199 10.60Other 211618 58.75

Type of MalariaPlasmodium falciparum 115260 32.00

Plasmodium vivax 244890 67.99Total 360150

Page 8: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

8

by N was overlaid onto Colombia’s jurisdictional boundaries and then the set ofpossible outbreaks is limited to all the possible sub rectangles within the grid.

Now, under the Bernoulli model we consider a measurement m for each rectangleR contained in G, where m(R) corresponds to an integer and in our specific case,the number of individuals inside the given rectangle. This leads us to assume thatthere is a rectangle Z contained in G such that each individual inside Z has aprobability p of being infected, while the individuals outside Z have a probabilityq. Let nR be the number of observed malaria cases inside R, so by assuming aBernoulli and the following hypothesis for our unknown variables p and q :

H0 : p = q(1)

H1 : p > q(2)

we have these possible distributions:

• Assuming H0:

nR ∼ Bin(m(R), p) ∀R ⊆ G(3)

• Assuming H1:

nR ∼ Bin(m(R), p) ∀R ⊆ Z and nR ∼ Bin(m(R), q) ∀R ⊆ ZC(4)

And hence, under H1, we have that Z is a region with potential malaria outbreak.Lastly, the third and final step is to establish a measure of density for each

subrectangle, in this case the likelihood ratio. This measurement of density hasdesirable properties to compare different sized rectangles [72]. [21] derives theformula for likelihood ratio of a generic region. The scan statistic (ss) is defined asthe highest density measurements for all subrectangles:

ss∗ = maxR

ss(R)(5)

(6)

ss(R) = pnR(1− p)m(R)−nRqnG−nR(1− q)(m(G)−m(R))−(nG−nR)(7)

The local measurement ss(R) can be interpreted as the likelihood that subrect-angle R is an outbreak.

To test the hypothesis represented in Eq 1 a Monte Carlo simulation was usedto obtain the histogram of the statistic ss* under the null hypothesis. Finally, itassesses the value of ss* with the observed data. If p is greater than 0.05 underthe null model, H0 is rejected and we assume an outbreak.

3.2. Synchronous epidemic visualization. The main objectives are to deter-mine whether abnormal behaviors are related across municipalities and finding, if arelationship exists, groups of them that have a similar temporal patterns, indepen-dent of their geographical layout. To address these matters we turn to TopologicalData Analysis.

We apply the method Mapper, introduced in [25], that constructs a representa-tion of the data in the form a graph. This graph allows the analysis and visual-ization of the data. The vertices of the graph correspond to local clusters and theinteractions between these clusters is enconded on the edges of the graph.

Page 9: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

9

Figure 2. We start with a given data set (image A), for this ex-ample the points correspond to a sample of the unitary circle witha small amount of noise. For convenience we will use the euclideandistance to calculate the distance between each pair of points. Inthe next step, we select the projection onto the Y coordinate asour filter function and apply it to the data set (image B).

Mapper detects phenomena that appear both at large and small scale better thanother methods, such as principal component analysis (PCA) and cluster algorithms.Mapper can be considered an hybrid method that is doing partial clustering, theregions where the clustering is done is guided by the filter. It is a refinement ofclustering and scatterplot methods like PCA.

The input of the method Mapper is a collection of points with a notion of simi-larity and a filter, a function defined on the collection of points. The filter is usedto define pieces that cover the collection of points. We apply a clustering algorithmto each piece to obtain a set of local clusters. These are the vertices of the graph.Edges are added to the graph in the following way: two local clusters are connectedif they have points in common. Since each vertex and edge correspond to subcol-lection of points, we can consider the graph to have weights. Figs 2 - 4 give a visualroad map through the Mapper algorithm applied to a set sample from the unitarycircle.

In this study, we implement TDA on disease data in Colombia to find topologicalcharacteristics that describe sociodemographic and spatiotemporal patterns.

For our each one of the 1,156 municipality we calculated an epidemic occurrencevector consisting of binary values for each week from 2007 to 2015. To constructeach weekly entry, we executed a Kulldorf clustering procedure (as explained in theprevious section) among all municipalities, but only with data from the given week.So a municipality k will have 1 on a certain entry if in the corresponding week it

Page 10: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

10

Figure 3. We now divide the image of the data set (under thefilter function) into evenly distributed overlapping intervals (imageC) and compute the corresponding points in their pre-image (im-age D). Notice how each pair of overlapping intervals, define twodifferent subsets of data that can have elements in common.

suffered a malaria epidemic or 0 otherwise.

To this new sample of vectors we applied TDA, selecting the cosine distance asthe similarity notion and the first and second principal components as the filterfunction, in order to search for significant clusters.

The output of the Mapper algorithm is a graph where we select significant sub-graphs by their size and other characteristics. To visualize the municipalities thatappear in the subgraphs of the Mapper output we make another graph were thenodes are the municipalities that appear in several nodes.

3.3. Ethnicity. Case reports, collected by the national surveillance system andconfirmed by laboratory, include components of age and ethnicity in the notificationform. This filled form is required by law for every case reported in the country, andtreatment to cure the disease is provided by the government for each case. Ethnicityand age are self-reported, and should be interpreted with caution (e.g. no geneticresistance can be inferred, for example). Three main ethnic groups were used inthis paper: indigenous, Afrocolombian and other (with no ethnic denomination,ND).

Data for age and ethnicity were displayed with descriptive statistics to developpatterns of malaria intensity by population for the clusters identified with TDAanalysis, and to identify two patterns of intensity. First, the occupational hazardrisk profile is characterized by a peak within a particular age-group and contains apronounced sex difference [74, 75]. Second, an endemic risk profile is characterized

Page 11: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

11

Figure 4. Inside every interval defined subset of data, we executea clustering algorithm to detect isolated groups of points (imageE). Each of the resulting groups will correspond to a node on theoutput graph (image F). Notice how nodes are joined together byarches when their corresponding groups have points of the data setin common.

by intense malaria exposure at young ages and reduced malaria at later ages, dueto overexposure [74, 75].

4. Results

4.1. Clustering. We found an outbreak of Plasmodium vivax malaria that com-prised the Amazon Basin, including the departments of Amazonas, Caquet, Meta,Guaviare, Putumayo and Narino. Regions of Vichada, Choc, Caldas and Antio-quia also presented outbreaks, as did the region surrounding Barranquilla in theCaribbean. Singular clusters for this species where detected in parts of Putumayo.Significant outbreaks of Plasmodium falciparum malaria in municipalities of Choc,Risaralda, Antioquia, Narino and Guaviare. Singular clusters for this species wheredetected in parts of Narino. All significant outbreaks are highlighted in Fig 5-7.

4.2. Synchronous epidemic visualization. The TDA enabled us to find at least5 groups of municipalities with similar behaviors (Fig 8 and Fig 9), in particular,three of these group show clusters with high overall disease intensity, defined as thenumber of cases with infants (age below 5 years) over the total number of cases.The geographic distribution of these clusters is as follows: two of this groups are re-spectively concentrated on the Pacific and Caribbean coasts, showing a geographicrelation among their municipalities, the remaining three groups have their membersscattered around different parts of the country, including Choc, Guania, Antioquiaand Casanare.

Page 12: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

12

0km 300km 600km

N

−5

0

5

10

15

−80 −75 −70

long

lat

Ocurrence

Epidemic

Not Epidemic

Epedemic Clusters for Malaria (All Parasites)

Figure 5. Significant outbreaks of malaria (all parasites) inColombia from 2007-2015, calculated using the scan statistic devel-oped by [21] based on a likelihood ratio. The significance thresholdparameter was calculated using a Bernoulli model where cases weresimulated for each municipality, and taking the maximum value.The process was iterated many times and the distribution of themaximum values was calculated to determine the 95% confidenceinterval. The method detects significant outbreaks of malarial in-fection along the Pacific Coast, the Magdalena river Basin, and theAmazon river Basin.

Page 13: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

13

0km 300km 600km

N

−5

0

5

10

15

−80 −75 −70

long

lat

Ocurrence

Epidemic

Not Epidemic

Epedemic Clusters for Plasmodium falciparum

Figure 6. Significant outbreaks of Plasmodium falciparum inColombia from 2007-2015. The method used to find significantoutbreaks is the same as described for Fig 5. Significant clusterswere observed in municipalities along the Pacific Coast, the borderwith Panama, and Northen Antioquia (the tertiary Cauca riverBasin). The municipalities: Policarpa and Cumbitir, in Narinoappear to be a hidden cluster for this parasite, since they were notmarked as epidemic when considering all malarial parasites. (Fig 5)

Page 14: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

14

0km 300km 600km

N

−5

0

5

10

15

−80 −75 −70

long

lat

Ocurrence

Epidemic

Not Epidemic

Epedemic Clusters for Plasmodium vivax

Figure 7. Significant outbreaks of Plasmodium vivax in Colom-bia from 2007-2015. The method used to find significant outbreaksis the same as described for Fig 5. Significant clusters were ob-served in municipalities of departments: Cordoba, Vichada andAntioquia. The municipality: Orito in Putumayo appears to bea hidden cluster for this parasite, since they weren’t marked asepidemic when considering all malarian parasites. (Fig 5)

Page 15: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

15

Figure 8. Graph constructed using TDA and principal compo-nent plot using the epidemic occurrence vectors, where selectedgroups have been highlighted. Highlighted groups where selectedby size and high overall disease intensity (Clusters 1,2,5 have anaverage of 10% disease intensity). Each cluster can be interpretedas a group municipalities with malaria incidence that have similartemporal behavior.

In the TDA graph each node represents a group of municipalities. The size ofeach node will be proportionate to the number of municipalities in the group, andtwo nodes will have an arch between them if they have at least one municipality incommon.

Now, it is possible to visualize the municipalities appearing in the TDA graphgeographically, to get a notion of the part of Colombia this cluster represents.

Each of the colored small dots in the map corresponds to a municipality, con-tained in some node of the corresponding colored cluster. Notice there are arrows

Page 16: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

16

0km 300km 600km

N

−5

0

5

10

15

−80 −75 −70

long

lat

Cluster

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Selected Clusters (Outliers)

Figure 9. Selected municipalities by TDA over the Colombianterritory. As expected, the clusters follow geographic pattern, sincethe time series where constructed using a Kulldorf procedure thatdetects clusters geographically. An unexpected result happens incluster 4, where the cluster is divided by two major non adjacentregions: Antioquia and Narino. This means that the malaria epi-demic follows the same time pattern among this two regions, eventhough they are geographically separate.

Page 17: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

17

Table 2. Selected central municipalities after executing TDAover the epidemic occurrence vectors. These are the municipalitiesresponsible for the connectivity among their respective groups andsubgraphs.

Cluster State Name Locality Rural Popu. Urban Popu. Total Popu.1 Choc Quibd 11752 101134 1128862 Choc Bajo Baud 13752 2623 163753 Guaviare San Jos del Guaviare 19131 34863 539944 Crdoba Tierralta 45895 32875 787705 Antioquia Santa Fe de Antioquia 9267 13636 22903

between some points in the map, they represent connections between the munic-ipalities (these connections are not the same as the arches between nodes in thegraph). Before we mention how these connections are constructed, lets us recallwhat centrality means for a municipality:

Given a certain municipality, its centrality corresponds to the number of nodesin the graph it appears in. This means that if we remove a municipality with highcentrality, it is very possible that the resulting graph will have less arches and inturn be disconnected.

Now, the connection scheme is as follows: Only municipalities that appear inseveral nodes in the TDA graph can have a connection. No municipality will beconnected to itself. Municipalities will be connected towards the municipalities inits node with the most centrality. Note that there could be municipalities withmultiple outgoing connections.

We also identified five significant municipalities with high centrality in the TDAgraph (Fig 8) and are reported in table 2. This municipalities are responsible forthe connection among several nodes in their corresponding subgraphs appearing inoverlapping zones of the selected TDA filter.

4.3. Ethnicity. Fig 10-12 show the histograms of age reports of malaria by eth-nicity and cluster. Two distinctive patterns consistently appearing throughoutdifferent regions of Colombia. First, an endemic profile risk (higher density ofcases at young ages) was observed for the indigenous populations of clusters 1 and2. Second, malarial infection suggesting occupational hazard and intensity of in-fection among the Afrocolombian populations of clusters 1 and 2. Occupationalhazard is consistently described for the population with no ethnic denominationacross clusters 1-4. In all cases, within the same cluster, we find both endemic andoccupational hazard infection patterns across populations segregated by ethnicity.In a histogram of case reports by age and sex, an occupational risk hazard hasa unique and characteristic signature: one age class, typically for only one sex,presents an outstanding number of cases compared to any other age class. In thecase of malaria in Colombia, we observed that men with no ethnic denomination ofages 20-25 were contracting malaria far more often than any other class. From thissimple observation, we inferred the following: first, men of this age class were engag-ing in activities that posed a risk of contracting the disease. Second, women werenot engaging in this activity, nor were men in other age classes. Third, there wasno household transmission, since infected men were not infecting other members oftheir family once they ceased to engage in the risky activity.

Page 18: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

18

Plasmodium falciparum Plasmodium vivax

Afro

Indig

enous

Oth

er

10 30 50 70 90 110 10 30 50 70 90 110

0

250

500

750

1000

0

200

400

600

0

1000

2000

3000

4000

Age

Cases (

count)

Sex

Female

Male

Malaria by species, age, sex and ethnicity in Colombia

Figure 10. Malaria by parasite species, age, sex and ethnicitygroup of human cases in Colombia. The indigenous ethnic groupshows a pattern of endemicity, with most cases being reportedfor the youngest ages, while people with no ethnic denominationand the Afrocolombian population show a pattern consistent withoccupational hazard risk.

5. Conclusions

Malaria in Colombia was characterized by a different intensity, connectivity andsegregation in each region. While there was a general pattern of risk throughout

Page 19: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

19

Afro Indigenous Other

Clu

ste

r 1C

luste

r 2C

luste

r 3C

luste

r 4C

luste

r 5

10 35 60 85 10 35 60 85 10 35 60 85

0

50

100

150

0

10

20

30

40

0

20

40

60

0

200

400

600

0

2

4

6

Age

Cases (

count)

Sex

Female

Male

Plasmodium falciparum

by cluster, age, sex and ethnicity in Colombia

Figure 11. Malarial infection by Plasmodium falciparum by clus-ter, age, sex and ethnicity in Colombia. Histograms for the indige-nous population in clusters 1 and 2 suggest that these populationsexperience intense exposure to malarial infection. Similarly, his-tograms for the Afrocolombian population in clusters 1 and 2 sug-gest lower intensisty of malarial infection, cluster 1 experiencingmore occupational hazard than cluster 2. Histograms for the pop-ulation with no ethnic denomination in clusters 1, 2, and 4 suggestmalarial infection is associated with occupational hazard. Clusters3 and 5 have too few cases to infer either transmission intensity oroccupational hazard.

Page 20: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

20

Afro Indigenous Other

Clu

ste

r 1C

luste

r 2C

luste

r 3C

luste

r 4C

luste

r 5

10 35 60 85 10 35 60 85 10 35 60 85

0

25

50

75

100

0

20

40

60

0

100

200

300

0500

1000150020002500

0

10

20

Age

Cases (

count)

Sex

Female

Male

Plasmodium vivax

by cluster, age, sex and ethnicity in Colombia

Figure 12. Malarial infection by Plasmodium vivax by cluster,age, sex and ethnicity in Colombia. Histograms for the indigenouspopulation in clusters 1, 2, and 3 suggest intense exposure to malar-ial infection among these populations, with cluster 2 experiencingthe most intense exposure. Histograms for the Afrocolombian pop-ulations of clusters 1, 2, and 3 suggest some degree of occupationalhazard transmission. The population with no ethnic denominationexperiences malarial infection as an occupational hazard in all clus-ters except in 5. It is interesting that cluster 3 is the only clusterwhere an occupational hazard exists without another populationexperiencing endemic malaria in the same region.

Page 21: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

21

the country associated with occupational hazard, some populations experiencedintense malaria exposure in endemic pockets. Understanding the interaction ofsuch pockets is fundamental for designing appropriate malarial control strategies.Here we have produced a systematic approach that analysis of malaria under threedimensions.

Colombia experienced a generalized malaria outbreak in the Amazon region forthe period studied. We found that there was little connectivity among the munic-ipalities that composed the Amazon region, and that this outbreak was spatiallyconnected to the Cauca Basin in Northen Antioquia. In the clusters connected tothe Amazon region, where there was relatively high degree of cultural diversity,indigenous populations experienced malaria in endemic patterns, contrary to therisk of the ND population for both the region and for the country.

The Cauca basin was characterized by different pattern: the Afrocolombianpopulation experienced a segregated exposure to Plasmodium falciparum in a waythat no other ethnic group did. The pattern observed for most of the country is notconsistent across the Pacific region, where Plasmodium falciparum also persistedat a relatively higher prevalence than in the rest of the country in comparisonto Plasmodium vivax. Most interestingly, the Cauca Basin region contained twodifferent populations that lived in pockets of endemicity, while it is also synchronouswith other regions, and furthermore it is part of a region where the scan statisticalgorithm detected an outbreak.

Our findings have potential implications for malarial infection control. First, wefound that malaria in Colombia did present different, isolated pockets with dis-tinctive epidemic characteristics. The magnitude of such differences in epidemiccharacteristics is relevant in studying the pressure of anti-malarials upon the para-site, since the emergence of resistance has been reported in the country. We foundthat Plasmodium falciparum was particularly acute among the Afrocolombian pop-ulation of the Cauca Basin and the Pacific region. However, in the Cauca Basin, itconstituted an isolated outbreak, while in the Pacific, the outbreak was dispersedamong both the Afrocolombian and the indigenous populations. Different parasiteloads among ethnically and culturally distinct populations constitute the quintes-sential mechanism of selective pressures that are ideal for the evolution of parasites.The diversity of epidemic characteristics of malarial infection among the subpopu-lations of Colombia account for an ideal environment for parasite evolution, whereplasmodia persist under different pressures of asymptomatic individuals, suscepti-ble classes of ethnically distinct populations, and public health interventions usingdifferent anti-malarial strategies. Such diversity provides the necessary conditions,acting as isolated experiments, and then sharing “successful” results, for the emer-gence of resistant parasites.

Second, the patterns of endemicity observed in these populations suggested thatprevention efforts should be population specific, and vary according to the epidemiccharacteristics exhibited by the parasite in the targeted population. Therapeuticfailures have been suggested to be correlated with high intestinal parasite loads[16]. The effectiveness of bed nets has been reported to be low among populationsthat experience intense malaria exposure [28]. We have identified populations thatexperienced malaria endemicity, where prevention efforts focused on the distributionof bed nets. Our findings, combined with previous knowledge suggest that publichealth interventions should integrate two aspects: 1) Diagnostic and treatment of

Page 22: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

22

asymptomatic malaria; and 2) Diagnostic and treatment of intestinal parasites (toreduce therapeutic failure).

Third, prevention strategies focusing on populations with endemic malaria wouldyield a reduction of occupational hazard malaria, since the occupational hazard isassociated to visiting locations where malaria persists endemically.

Page 23: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

23

0km 300km 600kmN

−5

0

5

10

15

−80 −75 −70

long

lat

Mean Cases Year

(0,0.1]

(0.1,0.8]

(0.8,1.2]

(1.2,5.6]

(5.6,13.4]

(13.4,65]

(65,109]

(109,347.6]

(347.6,956.2]

(956.2,3972.3]

Mean Malaria incidence by Municipality

Figure 13. Malarial incidence for both species in Colombia from2007-2015. Intervals where constructed using the Jenks procedure[76]

6. Additional Figures

Acknowledgments

The first author would like to thank Stanford University, the Zaffaroni family,and the Morrison Institute for their financial support. The second author acknowl-edges and thanks the financial support of the grant P12.160422.004/01- FAPA

Page 24: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

24

0km 300km 600kmN

−5

0

5

10

15

−80 −75 −70

long

lat

Mean Cases Year

(0,0.1]

(0.1,0.4]

(0.4,5.7]

(5.7,134.4]

(134.4,1415.4]

Mean Plasmodium falciparum incidence by Municipality

Figure 14. Plasmodium falciparum incidence in Colombia from2007-2015. Intervals where constructed using the Jenks procedure[76]

ANDRES ANGEL from Vicedecanatura de Investigaciones de la Facultad de Cien-cias de la Universidad de los Andes, Colombia. The third author acknowledges andthanks the financial support of the grant Proyecto Semilla 2017-1 from Vicede-canatura de Investigaciones de la Facultad de Ciencias de la Universidad de losAndes, Colombia

Page 25: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

25

0km 300km 600kmN

−5

0

5

10

15

−80 −75 −70

long

lat

Mean Cases Year

(0,0.1]

(0.1,0.7]

(0.7,1.1]

(1.1,3.4]

(3.4,8]

(8,20.6]

(20.6,72.4]

(72.4,123.4]

(123.4,724.7]

(724.7,2980.1]

Mean Plasmodium vivax incidence by Municipality

Figure 15. Plasmodium vivax incidence in Colombia from 2007-2015. Intervals where constructed using the Jenks procedure [76]

References

[1] Rubio-Palis Y, Zimmerman RH. Ecoregional classification of malaria vectors in the neotropics.Journal of Medical Entomology. 1997;34(5):499–510.

[2] Senz R, Molina JA, Quinones ML, Brochero HL, Olano VA. Mapas preliminares de la dis-tribucin de especies de Anopheles vectores de malaria en Colombia. Biomdica. 2001;21(4):402–

408.[3] Carmona-Fonseca J. La malaria en Colombia, Antioquia y las zonas de Urab y Bajo Cauca:

panorama para interpretar la falla teraputica antimalrica. Parte 2. Iatreia. 2004;17(1):34–53.

Page 26: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

26

[4] Alexander N, Rodriguez M, Perez L, Caicedo J, Cruz C, Prieto G, et al. Case-control study

of mosquito nets against malaria in the Amazon Region of Colombia. American Journal of

Tropical Medicine and Hygiene. 2005;73(1):140–148.[5] Arvalo-Herrera M, Lopez-Perez M, Medina L, Moreno A, Gutierrez JB, Herrera S. Clini-

cal profile of Plasmodium falciparum and Plasmodium vivax infections in low and unstable

malaria transmission settings of Colombia. Malaria journal. 2015;14(1):154.[6] Poveda G, Rojas W, Quinones ML, Vlez ID, Mantilla RI, Ruiz D, et al. Coupling between

annual and ENSO timescales in the malaria-climate association in Colombia. Environmental

health perspectives. 2001;109(5):489.[7] Gagnon AS, Smoyer-Tomic KE, Bush AB. The El Nino Southern Oscillation and malaria

epidemics in South America. International Journal of Biometeorology. 2002;46(2):81–89.

[8] Valero-Bernal MV. Malaria in Colombia: retrospective glance during the past 40 years. Re-vista de Salud Pblica. 2006;8(3):141–149.

[9] WHO. Malaria entomology and vector control. World Health Organization; 2013.[10] Hill AV. Malaria resistance genes: a natural selection. Transactions of the Royal Society of

Tropical Medicine and Hygiene. 1992;86(3):225–232.

[11] Greenwood B, Marsh K, Snow R. Why do some African children develop severe malaria?Parasitology Today. 1991;7(10):277–281.

[12] Rosario V. Genetics of chloroquine resistance in malaria parasites. Nature Publishing Group.

1976;261(5561):585–586.[13] Wellems TE, Plowe CV. Chloroquine-resistant malaria. Journal of Infectious Diseases.

2001;184(6):770–776.

[14] Soto J, Toledo J, Gutierrez P, Luzz M, Llinas N, Cedeno N, et al. Plasmodium vivax clini-cally resistant to chloroquine in Colombia. The American Journal of Tropical Medicine and

Hygiene. 2001;65(2):90–93.

[15] Castillo CM, Osorio LE, Palma GI. Assessment of therapeutic response of Plasmodium vivaxand Plasmodium falciparum to chloroquine in a malaria transmission free area in Colombia.

Memrias do Instituto Oswaldo Cruz. 2002;97(4):559–562.[16] Blair-Trujillo S, Lacharme-Lora L, Carmona-Fonseca J. Resistance of Plasmodium falciparum

to antimalarial drugs in Zaragoza (Antioquia, Colombia), 1998. Memrias do Instituto Oswaldo

Cruz. 2002;97(3):401–406.[17] Vallejo AF, Chaparro PE, Benavides Y, lvarez , Quintero JP, Padilla J, et al. High prevalence

of sub-microscopic infections in Colombia. Malaria journal. 2015;14(1):201.

[18] Ortega DC, Fong C, Cardenas H, Barreto G. Evidence of over-dominance for sickle cell traitin a population sample from Buenaventura, Colombia. International Journal of Genetics and

Molecular Biology. 2015;7(1):1–7.

[19] Ruiz D, Poveda G, Vlez ID, Quinones ML, Ra GL, Velsquez LE, et al. Modellingentomological-climatic interactions of Plasmodium falciparum malaria transmission in two

Colombian endemic-regions: contributions to a National Malaria Early Warning System.

Malaria Journal. 2006;5(1):66.[20] Bouma M, Poveda G, Rojas W, Chavasse D, Quinones M, Cox J, et al. Predicting high-risk

years for malaria in Colombia using parameters of El Nino Southern Oscillation. TropicalMedicine & International Health. 1997;2(12):1122–1127.

[21] Kulldorf M. A Spatial Scan Statistic. Communications in Statistics: Theory and Methods.

1997;6(26):1481–1496.[22] Kulldorff M, Athas W, Feurer E, Miller B, Key C. Evaluating cluster alarms: a space-time

scan statistic and brain cancer in Los Alamos, New Mexico. American Journal of PublicHealth. 1998;88(9):1377–1380.

[23] Hjalmars U, Kulldorff M, Gustafsson G, Nagarwalla N. Childhood leukaemia in Sweden:

using GIS and a spatial scan statistic for cluster detection. Statistics in Medicine. 1996;15(7-

9):707–715.[24] Burkom HS. Biosurveillance applying scan statistics with multiple, disparate data sources.

Journal of Urban Health. 2003;80(1):i57–i65.[25] Singh G, Memoli F, Carlsson G. Topological Methods for the Analysis of High Dimensional

Data Sets and 3D Object Recognition. Prague, Czech Republic: Eurographics Association;

2007. p. 91–100.

[26] Lum PY, Singh G, Lehman A, Ishkanov T, Vejdemo-Johansson M, Alagappan M, et al.Extracting insights from the shape of complex data using topology. 2013;3:1236 EP –.

Page 27: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

27

[27] Cameron D, Jones IG. John Snow, the Broad Street pump and modern epidemiology. Inter-

national Journal of Epidemiology. 1983;12(4):393–396.

[28] Snow R, Marsh K, Le Sueur D. The need for maps of transmission intensity to guide malariacontrol in Africa. Parasitology Today. 1996;12(12):455–457.

[29] Kilama M, Smith DL, Hutchinson R, Kigozi R, Yeka A, Lavoy G, et al. Estimating the annual

entomological inoculation rate for Plasmodium falciparum transmitted by Anopheles gambiaes.l. using three sampling methods in three sites in Uganda. Malaria journal. 2014;13(1):111.

[30] Omumbo J, Ouma J, Rapuoda B, Craig M, Lesueur D, Snow R. Mapping malaria transmission

intensity using geographical information systems GIS: an example from Kenya. Annals ofTropical Medicine and Parasitology. 1998;92(1):7–21.

[31] Kleinschmidt I, Bagayoko M, Clarke G, Craig M, Le Sueur D. A spatial statistical approach

to malaria mapping. International Journal of Epidemiology. 2000;29(2):355–361.[32] Kitron U, Pener H, Costin C, Orshan L, Greenberg Z, Shalom U. Geographic information

system in malaria surveillance: mosquito breeding and imported cases in Israel, 1992. TheAmerican Journal of Tropical Medicine and Hygiene. 1994;50(5):550–556.

[33] Beck LR, Rodriguez MH, Dister SW, Rodriguez AD, Rejmankova E, Ulloa A, et al. Re-

mote sensing as a landscape epidemiologic tool to identify villages at high risk for malariatransmission. The American Journal of Tropical Medicine and Hygiene. 1994;51(3):271–280.

[34] Candia J, Gonzlez MC, Wang P, Schoenharl T, Madey G, Barabsi AL. Uncovering individual

and collective human dynamics from mobile phone records. Journal of Physics A: Mathemat-ical and Theoretical. 2008;41(22):224015.

[35] Balcana D, Colizzac V, Goncalvesa B, Hud H, Ramascob JJ, Vespignania A. Multiscale

mobility networks and the spatial spreading of infectious diseases. PNAS. 2009;106(51).[36] Bharti N, Xia Y, Bjornstad ON, Grenfell BT. Measles on the edge: coastal heterogeneities

and infection dynamics. PloS One. 2008;3(4):e1941.

[37] Ferrari M, Djibo A, Grais R, Grenfell B, Bjørnstad O. Episodic outbreak bias estimates ofage-specific force of infection: a corrected method using measles as an example. Epidemiology

and Infection. 2010;138(01):108–116.[38] Ferrari MJ, Djibo A, Grais RF, Bharti N, Grenfell BT, Bjornstad ON. Rural–urban gradient

in seasonal forcing of measles transmission in Niger. Proceedings of the Royal Society B:

Biological Sciences. 2010;277(1695):2775–2782.[39] Bharti N, Tatem AJ, Ferrari MJ, Grais RF, Djibo A, Grenfell BT. Explaining seasonal

fluctuations of measles in Niger using nighttime lights imagery. Science. 2011;334(6061):1424–

1427.[40] Bishop YM. Statistical methods for hazards and health. Environmental Health Perspectives.

1977;20:149.

[41] Cazelles B, Chavez M, de Magny GC, Gugan JF, Hales S. Time-dependent spectral anal-ysis of epidemiological time-series with wavelets. Journal of the Royal Society Interface.

2007;4(15):625–636.

[42] Pascual M, Rod X, Ellner SP, Colwell R, Bouma MJ. Cholera dynamics and El Nino-southernoscillation. Science. 2000;289(5485):1766–1769.

[43] Pascual M, Bouma MJ, Dobson AP. Cholera and climate: revisiting the quantitative evidence.Microbes and Infection. 2002;4(2):237–245.

[44] Pascual M, Ahumada J, Chaves L, Rodo X, Bouma M. Malaria resurgence in the East African

highlands: temperature trends revisited. Proceedings of the National Academy of Sciences.2006;103(15):5829–5834.

[45] Grassly NC, Fraser C, Garnett GP. Host immunity and synchronized epidemics of syphilisacross the United States. Nature. 2005;433(7024):417–421.

[46] Duncan C, Duncan S, Scott S. Whooping cough epidemics in London, 1701-1812: infection

dynamics, seasonal forcing and the effects of malnutrition. Proceedings of the Royal Society

of London Series B: Biological Sciences. 1996;263(1369):445–450.[47] Rohani P, Green C, Mantilla-Beniers N, Grenfell B. Ecological interference between fatal

diseases. Nature. 2003;422(6934):885–888.[48] Cazelles B, Cazelles K, Chavez M. Wavelet analysis in ecology and epidemiology: impact of

statistical tests. Journal of The Royal Society Interface. 2014;11(91):20130585.

[49] Grenfell B, Bjørnstad O, Kappey J. Travelling waves and spatial hierarchies in measles epi-

demics. Nature. 2001;414(6865):716–723.

Page 28: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

28

[50] Cazelles B, Chavez M, McMichael AJ, Hales S. Nonstationary influence of El Nino on the

synchronous dengue epidemics in Thailand. PLoS Medicine. 2005;2(4):e106.

[51] Kreppel KS, Caminade C, Telfer S, Rajerison M, Rahalison L, Morse A, et al. A Non-Stationary Relationship between Global Climate Phenomena and Human Plague Incidence

in Madagascar. PLoS Neglected Tropical Diseases. 2014;8(10).

[52] Onozuka D. Effect of non-stationary climate on infectious gastroenteritis transmission inJapan. Scientific reports. 2014;4.

[53] Morris A, Gozlan RE, Hassani H, Andreou D, Couppi P, Gugan JF. Complex temporal

climate signals drive the emergence of human water-borne disease. Emerging Microbes &Infections. 2014;3(8):e56.

[54] Chaves LF, Pascual M. Climate cycles and forecasts of cutaneous leishmaniasis, a nonsta-

tionary vector-borne disease. PLoS medicine. 2006;3(8):e295.[55] Jos MV, Bishop RF. Scaling properties and symmetrical patterns in the epidemiology of

rotavirus infection. Philosophical Transactions of the Royal Society of London Series B: Bio-logical Sciences. 2003;358(1438):1625–1641.

[56] Cuzick J, Edwards R. Spatial clustering for inhomogeneous populations. Journal of the Royal

Statistical Society Series B (Methodological). 1990; p. 73–104.[57] Openshaw S, Charlton M, Wymer C, Craft A. A mark 1 geographical analysis machine for

the automated analysis of point data sets. International Journal of Geographical Information

System. 1987;1(4):335–358.[58] Snow R, Schellenberg JA, Peshu N, Forster D, Newton C, Winstanley P, et al. Periodicity

and space-time clustering of severe childhood malaria on the coast of Kenya. Transactions of

the Royal Society of Tropical Medicine and Hygiene. 1993;87(4):386–390.[59] Brooker S, Clarke S, Njagi JK, Polack S, Mugo B, Estambale B, et al. Spatial clustering of

malaria and associated risk factors during an epidemic in a highland area of western Kenya.

Tropical Medicine and International Health. 2004;9(7):757–766.[60] Coleman M, Coleman M, Mabuza AM, Kok G, Coetzee M, Durrheim DN. Using the SaTScan

method to detect local malaria clusters for guiding malaria control programmes. Malar J.2009;8(68):10–1186.

[61] Zhang W, Wang L, Fang L, Ma J, Xu Y, Jiang J, et al. Spatial analysis of malaria in Anhui

province, China. Malaria Journal. 2008;7(206):19.[62] Faires MC, Pearl DL, Ciccotelli WA, Berke O, Reid-Smith RJ, Weese JS. Detection of

Clostridium difficile infection clusters, using the temporal scan statistic, in a community

hospital in southern Ontario, Canada, 2006–2011. BMC infectious diseases. 2014;14(1):254.[63] Duczmal LH, Moreira GJ, Paquete L, Menotti D, Takahashi R, Burgarelli D. Dry Climate

as a Predictor of Chagas Disease Irregular Clusters: A Covariate Study. Online Journal of

Public Health Informatics. 2015;7(1).[64] Occelli F, Deram A, Gnin M, Noel C, Cuny D, Glowacki F, et al. Mapping End-Stage Renal

Disease (ESRD): Spatial Variations on Small Area Level in Northern France, and Association

with Deprivation. PLoS ONE. 2014;9(11).[65] Li J, Kolivras KN, Hong Y, Duan Y, Seukep SE, Prisley SP, et al. Spatial and Temporal

Emergence Pattern of Lyme Disease in Virginia. The American Journal of Tropical Medicineand Hygiene. 2014;91(6):1166–1172.

[66] Rogers DJ, Randolph SE. The global spread of malaria in a future, warmer world. Science.

2000;289(5485):1763–1766.[67] Nicolau M, Levine AJ, Carlsson G. Topology based data analysis identifies a subgroup of

breast cancers with a unique mutational profile and excellent survival. Proceedings of theNational Academy of Sciences of the United States of America. 2011;108(17):7265–7270.

[68] Ibekwe AM, Ma J, Crowley DE, Yang CH, Johnson AM, Petrossian TC, et al. Topological

data analysis of Escherichia coli O157:H7 and non-O157 survival in soils. Frontiers in Cellular

and Infection Microbiology. 2014;4:122.[69] Nielson JL, Paquette J, Liu AW, Guandique CF, Tovar CA, Inoue T, et al. Topological data

analysis for discovery in preclinical spinal cord injury and traumatic brain injury. NatureCommunications. 2015;6:8581.

[70] Sardiu ME, Gilmore JM, Groppe B, Florens L, Washburn MP. Identification of Topo-

logical Network Modules in Perturbed Protein Interaction Networks. Scientific Reports.

2017;7:43845.

Page 29: arXiv:1710.00317v1 [q-bio.PE] 1 Oct 20171 arXiv:1710.00317v1 [q-bio.PE] 1 Oct 2017 2 is not considered particularly relevant in malarial disease studies given the rela-tively low mortality

29

[71] Neill DB. Expectation-based scan statistics for monitoring spatial time series data. Interna-

tional Journal of Forecasting. 2009;25(3):498–517.

[72] Neill D, Wong W. A tutorial on event detection. Presented at the 15th ACM SIGKDDConference on Knowledge Discovery and Data Mining. 2009;.

[73] Kim AY, Wakefield J. SpatialEpi: Methods and Data for Spatial Epidemiology; 2016.

[74] da Silva-Nunes M, Moreno M, Conn JE, Gamboa D, Abeles S, Vinetz JM, et al. Amazo-nian malaria: asymptomatic human reservoirs, diagnostic challenges, environmentally driven

changes in mosquito vector populations, and the mandate for sustainable control strategies.

Acta Tropica. 2012;121(3):281–291.[75] Alves FP, Gil LHS, Marrelli MT, Ribolla PE, Camargo EP, Pereira Da Silva LH. Asymp-

tomatic carriers of Plasmodium spp. as infection source for malaria vector mosquitoes in the

Brazilian Amazon. Journal of medical entomology. 2005;42(5):777–779.[76] Jenks GF, Caspall FC. Error on choroplethic maps: definition, measurement, reduction.

Annals of the Association of American Geographers. 1971;61(2):217–244.