6318

Upload: zarcone7

Post on 01-Mar-2016

212 views

Category:

Documents


0 download

DESCRIPTION

g

TRANSCRIPT

  • 1083

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    ARTIGO ARTICLE

    Spatial statistical methods in health

    Mtodos estatsticos espaciais em sade

    1 School of MathematicalSciences, University of Exeter.Exeter, EX4 4QE, UK.

    Trevor C. Bailey 1

    Abstract The study of the geographical distribution of disease incidence and its relationship topotential risk factors (referred to here as geographical epidemiology) has provided, and contin-ues to provide, rich ground for the application and development of statistical methods and mod-els. In recent years increasingly powerful and versatile statistical tools have been developed inthis application area. This paper discusses the general classes of problem in geographical epi-demiology and reviews the key statistical methods now being employed in each of the applica-tion areas identified. The paper does not attempt to exhaustively cover all possible methods andmodels, but extensive references are provided to further details and to additional approaches.The overall aim is to provide a picture of the current state of the art in the use of spatial statis-tical methods in epidemiological and public health research. Following the review of methods,the main software environments which are available to implement such methods are discussed.The paper concludes with some brief general reflections on the epidemiological and publichealth implications of the use of spatial statistical methods in health and on associated benefitsand problems.Key words Spatial Analysis; Spatial Statistics; Statistical Models; Epidemiology

    Resumo O estudo da distribuio geogrfica da incidncia de doenas e da sua relao com fa-tores de risco potenciais (chamada aqui de epidemiologia geogrfica) vem constituindo umterreno frtil para a aplicao e desenvolvimento de mtodos e modelos estatsticos. Nos ltimosanos, foram desenvolvidas ferramentas cada vez mais poderosas e versteis nesta rea de aplica-o. O artigo discute as classes gerais de problemas na epidemiologia geogrfica e faz uma revi-so dos principais mtodos estatsticos utilizados atualmente em cada uma das reas de aplica-o identificadas. O artigo no procura cobrir exaustivamente todos os medos e modelos poss-veis, mas fornece referncias bibliogrficas para outros detalhes e abordagens. O objetivo geral dar um panorama do estado da arte no uso de mtodos estatsticos espaciais na pesquisa emepidemiologia e sade pblica. Aps a reviso metodolgica, o autor discute os principais am-bientes de software atualmente disponveis para implementar tais mtodos. O artigo concluicom algumas reflexes gerais sobre as implicaes, para a epidemiologia e a sade pblica, douso de mtodos estatsticos espaciais em sade, alm dos benefcios e problemas associados.Palavras-chave Anlise Espacial; Estatstica Espacial; Modelos Estatsticos; Epidemiologia

  • BAILEY, T. C.1084

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    Introduction

    The concerns of geographical epidemiology

    The analysis of the geographical distribution ofthe incidence of disease and its relationship topotential risk factors has an important role toplay in various kinds of public health and epi-demiological studies. For the purposes of thispaper this general area is referred to as geo-graphical epidemiology and four broad areasof statistical interest are identified:

    Disease Mapping focusses on producing amap of the true underlying geographical distri-bution of the disease incidence, given noisyobserved data on disease rates. This may beuseful in suggesting hypotheses for further in-vestigation or as part of general health surveil-lance and the monitoring of health problems.For example, in assisting to detect the outbreakof a possible epidemic, or in identifying signifi-cant trends in disease rates over time, or in par-ticular geographical localities.

    Ecological Studies are concerned with study-ing associations between observed incidenceof disease and potential risk factors as mea-sured on groups rather than individuals, wherethese groups are typically defined by geograph-ical areas. Such studies are valuable in investi-gating the aetiology of disease and may help totarget further research, and possibly preventa-tive measures.

    Disease Clustering Studies focus on identi-fying geographical areas with significant ele-vated risk of disease, or on assessing the evi-dence of elevated risk around putative sourcesof hazard. Uses include the targeting of followup studies to ascertain reasons for identifiedclustering in disease occurence, or the initia-tion of control measures where the aetiology ofidentified clustering is established.

    Environmental Assessment and Monitoringis concerned with ascertaining the spatial dis-tribution of environmental factors relevant tohealth and exposure to these, so as to establishnecessary controls or take preventative action.

    Sub-areas exist under any one of these fourmain headings depending on the particularepidemiological or public health context andupon whether data is available on individualcases of a disease, or only at the level of geo-graphical area, and upon whether there is atemporal as well as a spatial dimension to theanalysis. The distinction between the four maintypes of study is also somewhat blurred in prac-tice. For example, good disease incidence mapsoften play an important preliminary role instudies of disease clustering, disease mapping

    commonly incorporates relationships with co-variates representing known risk factors for thedisease, and putative hazards are sometimesusefully viewed as particular kinds of covariatein the analysis, while environmental assess-ment may well be the prelude to a study de-signed to investigate whether there is a rela-tionship between some suspected risk factorand disease incidence.

    These provisos accepted, the division of ge-ographical epidemiological concerns into fourmain areas provides a useful structure underwhich to review associated statistical methodsin the subsequent sections of this article. At thesame time it should be appreciated that al-though the focus of this paper is on geographi-cal epidemiology, the concerns discussed un-der these four headings can be extended to abroader public health context; for instance, inplanning the location of health services basedon relative risk, in estimating immunizationcoverage, or in assisting to design other formsof disease prevention or health education pro-grams. We do not explicitly comment on thesebroader uses, but they should be borne inmind in relation to the spatial methods subse-quently discussed.

    Statistical methods in geographical epidemiology

    Given the breadth and importance of the con-cerns in geographical epidemiology, as out-lined in the previous section, it is not surpris-ing that there has been considerable interest inthe area in recent years. Much of this interesthas been in the development of relevant statis-tical methods and techniques and there is nodoubt that this particular vein of research hasbeen, and continues to be, a fruitful source ofinteresting statistical problems, motivating suc-cessful methodological developments withinthat discipline.

    Several issues of major statistical journalshave been devoted to spatial statistical meth-ods in health applications (e.g. American Jour-nal of Epidemiology, 132: S1S202 1990; Jour-nal of Royal Statistical Society, Series A, 152-1989; Journal of Royal Statistical Society, SeriesD, 47-1989; Statistics in Medicine, 12-1993, 14-1995, 15-1996. There has also been a consider-able volume of papers in the field publishedseparately, key journals including: AmericanJournal of Epidemiology; Biometrics; Biometri-ka; Environmetrics; Journal of Environmentaland Ecological Statistics; Geographical Analy-sis; IEEE Transactions on GeoScience and Re-mote Sensing; Journal of Royal Statistical Soci-

  • Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    ety, Series A, B, C, & D; Journal of American Sta-tistical Association; Mathematical Geology; Sta-tistics in Medicine and Statistical Science. In ad-dition a number of significant recent texts havebeen devoted to this subject area (e.g. Elliot etal., 1996; Gatrell & Loytonen, 1998; Halloran &Greenhouse, 1997; Lawson et al., 1999a)

    There have also been various special initia-tives concerned with statistical methods in ge-ographical epidemiology. A notable examplewas the 1997 international workshop in Romein conjunction with the European initiative ondisease mapping and risk assessment and theWHO European Centre for Environment andHealth (see Lawson et al., 1999a). Some of thework conducted under the European Spatialand Computational Statistics Network has alsobeen particularly relevant to spatial epidemiol-ogy and this network has now held two inter-national workshops (Aussois, France, 1998;Crete, Greece, 1999). In addition to these, a con-siderable amount of statistical work has beenconducted under the aupices of other agencieswith long term interests in geographical andenvironmental health issues, e.g. the Centresfor Disease Control and Prevention (CDC); theEnvironmental Protection Agency (EPA); theNational Research Centre for Statistics and theEnvironment; the Pan American Health Orga-nization (PAHO); the World Health Organiza-tion (WHO) and various European Communitygovernment agencies.

    The net benefits of the statistical efforts as-sociated with all this activity are difficult tojudge. Certainly we now have statistical toolsthat are capable of addressing much more com-plex situations than was the case say ten yearsago. However, the epidemiological and publichealth implications and benefits arising fromthe use of such methods are more difficult toassess. This point is returned to in the final sec-tion of this article after looking at some of thestatistical developments in more detail in pre-ceding sections (each of which relates to one ofthe four main headings identified earlier andafter considering the software environmentswhich are available to implement methods dis-cussed.

    One general point worth making at thisstage is that although certain of the four areasreviewed are characterized by specialized sta-tistical methods, there is also considerableoverlap in the statistical modelling that hasbeen employed in any one of them. For exam-ple, disease clustering studies have given riseto an extensive and rather specialized literatureon hypothesis tests for either focussed orunfocussed clusters of disease. Environmen-

    SPATIAL STATISTICS 1085

    tal assessment also inevitably involves a focuson specialised spatial interpolation methods,some of which are derived from the geostatisti-cal literature. However, these kinds of excep-tions aside, a distinctive feature of much of therecent modelling work in all areas is a Bayesianapproach. Indeed, the various areas of geo-graphical epidemiology have all provided a veryfruitful area for the application of Bayesianmodels and associated Markov Chain MonteCarlo (MCMC) methodology. As will be seen insubsequent sections, the application of Bayesiantechniques in Disease Mapping, in EcologicalStudies, in Disease Clustering Studies and in En-vironmental Assessment and Monitoring is nowwell-established and accompanied by an ex-tensive and growing literature.

    Data types in geographical epidemiology

    Before embarking on a review of methods un-der the four topic headings discussed earlier, itis useful to make some broad distinctions inthe types of data that might have to be dealtwith in any of these areas, since that will assistto further categorize the methods under eachheading.

    Broadly speaking, there are essentially fourkinds of data which have to be considered. Someproblems may involve simply one of these typesof data, but often mixtures of data types maybe involved. Some methods discussed in sub-sequent sections may only be appropriate to aspecific data type, others may be able to be ap-plied (or modified to apply) to more than onedata type. The four data types are:

    Irregular lattice data measures aggregat-ed/averaged to the level of census tracts or oth-er type of administrative district. Could becounts of cases or population at risk, socio-economic measures, environmental assess-ments etc.

    Case-event data locations (usually residen-tial) of individual cases of a disease, or of indi-vidual members of a suitable control group(population at risk). Covariates may also bemeasured on each individual.

    Geostatistical data measurements (usual-ly of an environmental nature) sampled at pointlocations.

    Regular lattice data measures aggregat-ed/averaged to a regular grid (typically arisingfrom remote sensing).

    In any of the above cases there could alsobe a temporal dimension as well as a spatialdimension to the data. For example, we mighthave case-event data on a disease where boththe spatial location of cases and the time of

  • BAILEY, T. C.1086

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    onset of the disease is recorded. Environmen-tal data is also often collected in both spaceand time.

    Disease mapping

    Maps of disease incidence have always playeda key descriptive role in spatial epidemiology.They are useful for several purposes such as:identification of areas with suspected eleva-tions in risk of a disease, assisting in the formu-lation of hypotheses about disease aetiologyand assessing potential needs for geographicalvariation in follow-up studies, preventativemeasures, or other forms of health resource al-location.

    From a statistical point of view the problemof disease mapping amounts to obtaining agood estimate of the geographic heterogene-ity of the disease rate over the study area. Theobvious approach is to map standardized rates,but many of the diseases of interest are rela-tively uncommon and observed SMRs there-fore have high natural variability with extremevalues tending to occur in areas with the small-est populations. The areas of greatest potentialinterest are thus often associated with the leastreliable data. One therefore seeks methods toproduce a more reliable map of the underlyinggeographical variation in disease rates whichreduces excess local variability at the same timeas correcting for variations produced by popu-lation age/sex variations or other well-knownrisk factors. Methodologically, there are con-ceptual similarities to general statistical tech-niques designed to clean observed spatial im-agery.

    Most commonly, the observed data on dis-ease incidence is aggregated to an irregular lat-tice i.e. counts of cases and corresponding pop-ulations in area units. However, as health infor-mation systems steadily improve, there is anincreasing demand for methods that can beused with case event data, where more preciselocations of cases (usually residential address-es) are known. Models for the two different da-ta types are dealt with separately below.

    Mapping aggregated data

    There are several different statistical approach-es, but the focus here is on what has emergedas the mainstream methodology that basedon Bayesian hierarchical models. The basicmodel employed is that the observed counts ofcases, y = (y1 , . . . , yn ), in the different areas,each follow a Poisson distribution with mean

    i = ei i , where ei is the expected number ofcases in each area (based upon the populationat risk and suitable overall reference rates forthe disease) and i is the relative disease risk inarea i.

    Generally the expected number of cases eiis assumed known and often incorporates strati-fication corrections for known confounders,such as age and sex (i.e. ei = j rj pij , where rjare known group overall reference rates and pijis the population of type j in area i). In the caseof such direct standardization, modelling of-ten focusses on the log relative risk of the dis-ease i = log i.

    If the i are taken as fixed effects then theirmaximum likelihood estimates are simply

    i = log (yi ),ei

    i.e. the relative risk estimates are just the tradi-tional SMRs. But, as mentioned previously,SMRs in small areas may be unreliable becausethe most extreme SMRs are often based on on-ly a few cases. Some smoothing of the rawSMRs is therefore incorporated into the modelby taking the i as random effects. Essentiallythis allows for overdispersion in the Poissonmodel caused by unobserved confounding fac-tors (e.g. see Clayton & Bernardinelli, 1996;Molli, 1995).

    The most common method of estimatingthe vector of random effects = (1 , . . . , n)is to adopt a Bayesian approach. Each i is as-sumed to arise from a suitable prior distribu-tion with relevant hyperparameters each ofwhich in turn arise from a suitably non-infor-mative hyperprior distribution. Various spec-ifications of the prior and hyperprior distribu-tions are possible (e.g. see Bernardinelli et al.,1995a), but a typical choice in disease mappingis to take i ~N( ,

    2) with the non-informa-

    tive hyperpriors being a normal distributionfor the hyperparameter and a gamma distri-bution for the hyperparameter 1/2, with largevariances in both cases. In general if P (|) de-notes the chosen prior distribution involving avector of hyperparameters and if P () is theassociated joint hyperprior, then the joint pos-terior distribution of all of the parameters giv-en the data y is derived from the relationship:

    P (, |y) P (y|)P (|)P ()

    Given P (, |y), the parameters of interest,, are then estimated from this posterior distri-bution via = E(|y). Unfortunately, direct math-ematical derivation of the posterior P (, |y)from the above relationship involves a high-di-mensional integration to obtain the constant

  • SPATIAL STATISTICS 1087

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    of proportionality (the normalising constant)and is not mathematically tractable. Therefore,in practice, either empirical Bayes methods orMonte Carlo simulation is used to approximateP (, |y) indirectly.

    In empirical Bayes (e.g. Clayton & Kaldor,1987; Devine & Louis, 1994; Martuzzi & Elliott,1996) the unknown vector of hyperparametersis replaced by suitable estimate . The problemof deriving the posterior then simplifies, sincethe corresponding relationship is now:

    P (|y, ) P (y|)P (| )

    and this can be handled by direct mathemati-cal analysis. Commonly, the hyperparameterestimates that are used are their maximumlikelihood estimates from the marginal likeli-hood P(y|) = P(y|)P(|)d. In that case isobtained from information pertaining to theoverall map structure (hence the terminologyempirical the hyperparameters are estimat-ed from global aspects of the same data set).

    The problem with the empirical approachis that it makes no allowance for uncertainty inestimating the hyperparameters are simplyreplaced by their estimates assuming these tobe error free (e.g. see Bernardinelli & Mont-moli, 1992). In the MCMC approach the full hy-perprior framework is used, but rather than at-tempt to determine P(, |y) by direct mathe-matical analysis, instead observations are indi-rectly simulated from this posterior usingMCMC methods (e.g. see Brooks, 1998; Gilks etal., 1996). The desired parameter estimates arethen calculated from relevant sample statisticsof the simulated values from P(, |y). The ba-sic idea of the MCMC approach is to simulatevalues from a Markov Chain whose equilibri-um distribution is the same as the posteriordistribution of interest. This is achieved via thegeneral Metropolis algorithm which only re-quires the complex joint posterior distribution,P(, |y), to be specified up to the normalizingconstant (e.g. see Gilks et al., 1993). One partic-ular variant of the general Metropolis algo-rithm known as Gibbs sampling (e.g. see Gilkset al., 1993) is convenient when conditionalposterior distributions of each parameter giv-en all the others are available up to a normaliz-ing constant (as is the case here and often inspatial models more generally) see Gilks et al.,(1993). Gibbs sampling is implemented in theBUGS or WinBUGS computer packages (Spiegel-halter et al., 1997) which provide a relativelyeasy way to fit a large range of Bayesian mod-els. It consists of visiting each parameter in turn(i.e. here each i in and each hyperparameter

    in ) and simulating a new value for this para-meter from its full conditional distribution giv-en the current values for the remaining para-meters (i.e. here from P (i|ji , , y) etc.)

    Regardless of which particular variant ofthe Metropolis algorithm is adopted, after dis-carding a suffcient number of initial burn insimulations the MCMC approach results in re-peated sets of simulated values for the para-meters (, ) from their posterior distributionP (, |y). Samples from the marginal distribu-tions (e.g. P (i|y)) are then approximated bysimply picking out the values for one parame-ter from the simulated samples for (, ) ignor-ing the other parameters. Point estimates con-cerning the parameter are then obtained fromthe sample mean, of that set of values.

    The basic model for relative risk that hasbeen considered so far allows for Poissonoverdispersion in the distribution of diseasecounts yi via the random effects i. This maypartially account for unmeasured covariatesthat induce spatial dependence in the yi , but itdoes not allow for explicit spatial dependencebetween the yi. The latter may be present (e.g.see Clayton et al., 1993) arising, for example,through lesser variability of rates in neighbor-ing densely populated urban areas as opposedto sparsely populated rural areas, or throughan infectious aetiology of the disease. Such ex-plicit spatial dependence may be incorporatedinto the model by including an additional spa-tially structured random effect term (e.g. seeClayton & Bernardinelli, 1996; Molli, 1995).The model is extended to: log i = log ei + i + i ,so that now the log relative risks are given byi + i. The priors and hyperpriors relating to iare as before. But i are taken to have a spatial-ly structured prior. A typical choice is to use aconditional intrinsic Gaussian autoregres-sive model (an example of a CAR, see Besag &Kooperberg, 1995) where:

    i |ji ~ N (ji wij j ,

    2 )ji wij ji wij

    where wij are suitably chosen proximity weightsfor the areas (often simply 1 if two areas are ad-jacent, 0 otherwise) and the new hyperparame-ter controls the strength of local spatial de-pendence. Typically a vague gamma hyperpri-or is assumed for 1/2. MCMC methods thenprovide samples from P (, |y) where now =(, , ).

    Further extensions to the disease mappingmodel are possible to include covariates whichcorrect for known risk factors other than thoseincorporated into the direct standardization

  • BAILEY, T. C.1088

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    term (i.e. the expected cases ei ). In that case,the model essentially becomes similar to thatused in ecological investigations (see SectionEcological Studies). Indirect standardization isalso one example of this, where population val-ues in age/sex groups are treated as covariatesin the model with associated unknown overallgroup reference rate parameters and this re-places the known expected number of casesei . The group reference rates are then estimat-ed as part of the model.

    Mapping case event data

    Again various approaches exist, but generallythere has been less work in this area than onaggregated data and a mainstream method-ology is more difficult to identify. The basicmodel usually adopted is that locations of indi-vidual cases and of individuals in the popula-tion at risk both arise as inhomogenous Pois-son processes with spatially varying intensities(events per unit area) denoted by (s) and (s)respectively, where s represents spatial posi-tion. Then:

    (s) = (s)(s)

    where is the overall reference rate for thedisease and (s) is the relative risk surface. Of-ten interest focusses on the estimation of logrelative risk (s) = log (s) rather than directlyon (s).

    A typical practical situation is that data areavailable on n = n1 + n2 point locations whichcorrespond to n1 cases of the disease at loca-tions (s1 , . . . , sn1 ) and n2 cases of a suitablecontrol group at locations (sn1+1 , . . . , sn ). Thecontrol group may be primary data, or sec-ondary data obtained by appropriate simula-tion of the location of control cases from socio-demographic data aggregated to small areascovering the region concerned. Given this datastructure, a straightforward approach (seeBithel, 1990; Kelsall & Diggle, 1995) is then tonon-parametrically estimate (s) (to within anadditive constant) via a log ratio of Kernel esti-mates as:

    (s) = log( n1i=1 K (s - si ; ) )ni=n1+1 K(s - si ; )

    where K(.) is some suitable radially symmetrickernel function and is a suitably chosen band-width.

    Choise of an optimal bandwidth is howev-er rather difficult in the above approach (seeKelsall & Diggle, 1995) and more recent work

    (Kelsall & Diggle, 1998) avoids that problem byadopting a non-parametric binary regressionapproach which results in an indirect estimateof (s). Essentially the method is to attach bi-nary values yi to all n data locations such thatyi = 1 if the location corresponds to a diseasecase and yi = 0 if it does not. The probabilitythat any point is a disease case, (s), is then es-timated via kernel regression (e.g. see Green &Silverman, 1994) of yi on si i.e. by:

    (s) = ni=1 K (s - si ; )yini=1 K(s - si ; )

    Then (to an additive constant) (s) is givenby logit ( (s)) i.e. by

    log ( (s) ).1 - (s)

    Bandwith selection methods are then easier tohandle and the approach can also be extendedto include covariates measured on each indi-vidual so as to correct for additional known riskfactors for the disease. This is achieved via aGeneralised Additive Model (GAM) as discussedlater in the Section Ecological Studies.

    Further issues and approaches in disease mapping

    There are several further extensions and varia-tions on the basic ideas used in disease map-ping. This section lists some of the most signif-icant issues that have been considered with rel-evant references.

    General methods for simple exploratoryanalyses of spatial data which may be usefullyapplied in relation to disease incidence havebeen investigated by several authors (e.g. Cis-laghi et al., 1995; Haining et al., 1998; Unwin &Unwin, 1998; Walter, 1993b; Wilhelm & Steck,1998). The addition of further covariates to fur-ther refine the basic disease mapping modelhas already been mentioned (e.g. see Bernar-dinelli et al., 1997; Clayton & Bernardinelli,1996; Clayton et al., 1993; Martuzzi & Elliott,1996; Molli, 1995; Muller et al., 1997; Xia et al.,1997). Several authors have also consideredhow models can be extended to handle diseaseincidence data which has a temporal, as well asa spatial dimension (e.g. Bernardinelli et al.,1995b; Knorr-Held & Besag, 1998; Waller et al.,1997). Special problems introduced by edge ef-fects in disease mapping have been discussedby Lawson et al. (1999b). Bayesian mixture orlatent structure models have also been used indisease mapping as an alternative to the moreconventional models discussed earlier (e.g.Richardson & Green, 1997; Schalttmann & Bohn-

  • SPATIAL STATISTICS 1089

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    ing, 1993). Some other studies have also con-sidered the application of geostatistical inter-polation models (primarily variants of krig-ing) to the analysis of disease rates (e.g. Carratet al., 1992; Webster et al., 1994).

    Ecological studies

    As mentioned previously, ecological studiesare concerned with studying associations be-tween observed incidence of disease and po-tential risk factors as measured on groupsrather than individuals, where these groups aretypically defined by geographical areas or loca-tion. Such studies are valuable in investigatingthe aetiology of disease and may help to targetfurther research and possibly preventativemeasures.

    From a statistical point of view ecologicalstudies involve regression type models, but themodels are complicated by the need to allowfor both spatial and aspatial confounding fac-tors (see Clayton et al., 1993; Prentice & Shep-pard, 1995; Richardson et al., 1992). Usuallysuch studies involve observed data on diseaseincidence which is aggregated to an irregularlattice i.e. counts of cases and correspondingpopulations in areal units. However, more re-cently there has also been some work studyingassociations between suspected risk factorsand disease incidence using case event data ormixtures of aggregated data (relating to riskfactors) and individual data (relating to diseaseincidence). Models for aggregated disease inci-dence data and for that which involves caseevents are dealt with separately below. The ag-gregated nature of the data normally involvedin ecological studies has led to considerableemphasis on the need to avoid, and if possible,correct for the so-called ecological fallacy i.e.the various forms of bias associated with mak-ing inference about the effects of factors onthe disease risk of individuals from relation-ships obtained on groups where within groupvariability cannot be assessed (e.g. see Axel-son, 1999; Elliot et al., 1996; Prentice & Shep-pard, 1995).

    Models for aggregated data in ecological studies

    As in disease mapping, several different ap-proaches have been used. The one that tendsto dominate in the literature uses extensions tothe Bayesian hierarchical models employed indisease mapping and the focus here is mostlyon that framework. It should be noted howev-

    er, that other forms of spatial regression modelhave also been adopted, some of which havepotential advantages in terms of addressingecological bias and that point is returned to inSection Further Issues and Approaches in Eco-logical Studies.

    The basic Bayesian hierarchical modeladopted is a straightforward extension of thatdiscussed under disease mapping. Now K co-variates, (xi1 , . . . , xiK), are included and relatedto suspected risk factors measured in eacharea, so that the model becomes:

    log i = log ei + K

    k=1k xik + i + i

    with i, ei, i, i as in Section Mapping Aggregat-ed Data and k are new parameters reflectingthe influence of each covariate on the log rela-tive risk which is now modelled as k kxik + i+ i . As mentioned previously, one could dropthe direct standardization term, log ei , andinstead use indirect standardization by incor-porating a constant 0 and including amongstthe covariates relevant measures of populationage/sex structure.

    The priors and hyperpriors for i and i arechosen as in Section Mapping Aggregated Da-ta. The new parameters, , are each taken tohave specified non-informative priors (e.g.Normal distributions with large variances). Wethen proceed as before using MCMC methodsto derive samples from the posterior P (, , |y)with referring, as earlier, to the vector of hy-perparameters relating to the random effects iand i. Further details and variations on thisbasic modelling framework may be found inmany published examples of ecological studies(e.g. see Bernardinelli et al., 1997; Clayton etal., 1993; Lawson et al., 1999a; Molli, 1995;Richardson et al., 1992; Rushton et al., 1996;Spiegelhalter, 1998)

    Models for case event data in ecological studies

    There has been relatively little work in this areacompared with that devoted to aggregated dis-ease incidence data. Some recent initiatives in-clude work by Kelsall & Diggle (1998) and Law-son & Clark (1999). The latter work was men-tioned in passing in relation to disease map-ping in Section Mapping Case Event Data. It in-volves the extension of the non-parametric bi-nary regression model considered there to aGAM. Recall that in the model discussed previ-ously the focus was on estimation of (s), theprobability that any point is a disease case in acombined realization of cases and controls, the

  • BAILEY, T. C.1090

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    log relative risk surface, (s), is then related tothis (up to an additive constant) by logit ((s)).When additional risk factors are involved theninstead of estimating (s) by kernel regression,K covariates are included, (x1(s), . . . , xK(s)), inthe regression, so that the data yi are observa-tions on a binary response variable (case ornot case) with associated probability (s)such that:

    logit ((s)) = K

    k=1k xk(s) + (s)

    If (s) is assumed to be a smooth functionof s, then the above represents a GAM with alogit link function. GAMs are fitted by an itera-tively weighted additive model procedure (seeHastie & Tibshirani, 1990) that is implementedin several software packages (e.g. Splus Vern-ables & Ripley, 1994) Alternative ways of han-dling associations between suspected risk fac-tors and disease incidence using case event da-ta are discussed in Best et al. (1998); Lawson &Clark (1999) and Lawson et al. (1999a).

    Further issues and approaches in ecological studies

    Several further extensions and variations onthe basic models used in ecological studieshave been investigated. This section lists someof the most significant issues that have beenconsidered with relevant references.

    The general approach of graphical models(e.g. Spiegelhalter, 1998) provide a particularlyvaluable framework within which to specify thedependency structure of hierarchical Bayesianecological models. Corrections to adjust formeasurement error in the covariates have beensuggested by Bernardinelli et al. (1997). Mix-tures of case event and aggregated data havebeen discussed by Best et al. (1998) and Plum-mer & Clayton (1996). Thomson et al. (1999)has considered a situation involving aggregat-ed data corresponding to a mix of different ge-ographical scales. Bayesian latent structure ormixture models have also been employed inecological studies as an alternative to the moreconventional model discussed in Section Mod-els for Aggregated Data in Ecological Studies(e.g. Schalttmann et al., 1996; Weir & Pettitt,1999). Multi-level models (Goldstein, 1995)have also been employed as an alternative tothe Bayesian approach (e.g. Congdon, 1998;Langford & Lewis, 1998). Other forms of spatialregression models have also been adopted(Brunsdon et al., 1998; Christiansen & Morris,1997; Ghosh et al., 1998; Prentice & Sheppard,1995; Wolpert & Ickstadt, 1998; Yasui & Lele,

    1997) Some of these involve so-called aggre-gated models which are particularly orientat-ed to reducing ecological bias by combiningpartial samples of individual level data on riskfactors in addition to that on groups at the are-al level. Ecological models appropriate for spa-tio-temporal data have also been considered(e.g. Bernardinelli et al., 1995b; Knorr-Held &Besag, 1998; Waller et al., 1997; Wikle et al.,1998). Methods for longitudinal data in general(e.g. see Diggle et al., 1994) have also been ap-plied in ecological studies.

    Disease clustering studies

    As mentioned earlier, disease clustering stud-ies seek to establish significant unexpectedelevated risk of a disease either in space, or inspace and time. Such localized clusters couldarise from many factors e.g. an unidentified in-fectious agent, localized pollution sources, orlocalized common treatment side effects. Thereare several comprehensive general reviews ofthe area (e.g. Alexander & Boyle, 1996; Alexan-der et al., 1991; Anderson & Titterington, 1997;Bithell, 1995; Hills & Alexander, 1989; Kulldorf& Nagarwalla, 1995; Olsen et al., 1996).

    In general, disease cluster studies may seekto investigate a general tendency to cluster(no prespecified locations or number of sus-pected hazards) or be concerned with focussedclustering (prespecified number and locationsfor putative hazards). The two situations arediscussed separately below. Note that the sec-ond situation naturally provides for a morepowerful statistical test of the suspected clus-tering because the hypothesis is more tightlyspecified. However, there is a clear need to avoidwhat is sometimes referred to as centering thetarget where the bullet strikes which in thiscontext would imply using the data to explorewhere elevations in risk appear to exist andthen subsequently using those locations in atest of focussed clustering.

    In general, disease clustering studies mayinvolve either case event or aggregated data(see Diggle & Elliott, 1995, for a discussion ofthe relative merits). In both cases known popu-lation heterogeneity and other covariates mustbe allowed for along with any natural tendencyto cluster through effects induced by data ag-gregation or inadequately measured covariates.

  • SPATIAL STATISTICS 1091

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    Assessment of general clustering

    A large amount of work in this area has fo-cussed on development of hypothesis tests fora general tendency to cluster. Some of the mostcommonly used tests with associated refer-ences are listed below.

    For aggregated data, the key hypothesistesting approaches suggested include thosediscussed by: Alexander et al. (1991); Assuno& Reis (1999); Besag & Newell (1991); Kulldorf(1997); Kulldorf & Nagarwalla (1995); Kulldorfet al. (1997); Oden (1995); Potthoff & Whit-tinghill (1966); Tango (1995); Turnbull et al.(1990); Walter (1993a, 1994); and Wittemore etal. (1987). Many of these various tests are moreor less refined variations on similar themes.Some are concerned only with assessment of atendency to cluster, others also identify the lo-cations where such clustering occurs. For caseevent data, key hypothesis tests include thosediscussed by: Anderson & Titterington (1997);Cuzick & Edwards (1990); Diggle & Chetwynd(1991); and Oppenshaw (1991); Again some ofthese are concerned only with assessment of atendency to cluster, others also identify the lo-cations where such clustering occurs. Hypoth-esis tests designed to detect spatio-temporalinteraction include those discussed by: Baker(1996); Jacquez (1996b); Knox (1964); Kulldorf& Hjalmars (1999); Lawson & Viel (1995) andMantel (1967).

    The problem with many such hypothesistests for general clustering is that positive re-sults invariably leave subsequent questionsunanswered how many clusters are there?How big are they? Where are they? For that rea-son approaches to disease clustering whichemploy an explicit model have some advan-tages. Recent work by Lawson & Clark (1999)typifies that kind of approach. In the case eventsituation they suggest extending the kind ofcase event model discussed in relation to dis-ease mapping in Section Mapping Case EventData to:

    (s) = (s)m(s; , 1 , . . . , , )

    where, as before, (s) is the intensity of diseasecases, (s) is that of the population at risk and is an overall disease rate, but now the previ-ous unknown relative risk surface (s) is re-placed by the specified function m() which isparameterised in terms of an unknown num-ber of clusters, , a corresponding unknown setof cluster locations, (1 , . . . , ), and a set offurther parameters, , which relate to the riskdecay around clusters.

    For example, one possible specification forsuch a model might be:

    (s) = (s) {1 +

    k=1

    e-(s - 2)/22 }2

    In that case, it has been suggested that (s)be estimated from a set of controls using nonparametric density estimation with a suitablebandwidth ( can be derived separately or con-sidered an additional unknown parameter inthe Bayesian framework). Given such an esti-mate, (s), MCMC methods are then used to es-timate the joint posterior for all the remainingunknown parameters involved. Note that sincethe number of parameters depends on , whichis itself a parameter, then this model is like aBayesian mixture model with an unknownnumber of components and reversible jumpMCMC sampling must be used (e.g see Richard-son & Green, 1997). This model for case eventdata can be further generalized to allow for co-variates and random effects in m(). It can alsobe adapted for use with aggregated data con-sisting of counts yi in areas Ai with means i via:

    i = Ai (s)m(s; , 1 , . . . , , , )ds

    Details of how this is handled in a MCMCframework are provided in Lawson & Clark(1999) and Lawson et al. (1999a).

    Assessment of focussed clustering

    As for general clustering, several hypothesistests for focussed clustering have been pro-posed. Some of the most commonly used testswith associated references are listed below.

    For aggregated data, key hypothesis testingapproaches for focussed clustering includethose discussed by: Besag & Newell (1991);Stone (1988) (the focussed version of the gener-al clustering test); Bithel (1995); Lawson (1993);and Waller & Lawson (1995). Correspondingtests for case event data include those dis-cussed by: Cuzick & Edwards (1990); Stone(1988) (the focussed version of the generalclustering test); Diggle & Elliot (1995); Korie etal. (1998); and Lawson & Waller (1996).

    Again explicit modelling approaches haveadvantages if it is possible to use them, and thekind of models discussed in Section Assessmentof General Clustering can be used with pre-specified cluster locations (e.g. see Lawson,1995). In the simplest situation for case eventdata with a single putative source at known lo-cation S0, a suitable model might take the form:

  • BAILEY, T. C.1092

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    (s) = (s) (1 + 1 e-2 s - s0 )

    MCMC methods are then used to estimate 1and 2 with (s) estimated from a set of controlsusing non parametric density estimation with asuitable bandwidth ( can be derived separate-ly or considered an additional unknow parame-ter in the Bayesian framework). Extensions toinclude covariates can also be developed.

    Earlier work by Diggle & Rowlingson (1994)and Diggle et al. (1997) uses a similar model,but avoids the density estimation of (s) by fo-cusing on the probability that an event is a dis-ease case in the combined point process con-sisting of both cases and controls, as discussedpreviously in Section Maping Case Event Data.The model for a single source at known loca-tion s0 is taken as:

    (s) = (s)(1 + m(s - s0 ;)

    where m(.) is a suitably chosen function to re-flect risk decay around the source. As before,binary values yi = 1 if point is a disease case andyi = 0 if it is not. Then if

    (s) =(s)

    (s) + (s)denotes the probabilty that any point is a dis-ease case, we have:

    logit ((s)) = log ( (s) ) = log + log (1 + m(s s0 ; )(s)

    So (s) has been conditioned out of the mod-el and logistic regression with a binary responsemay be used to estimate the parameters . Thisis a non-linear regression, but with a suitablechoice of m(.) maximum likelihood estimationis relatively straightforward. Biggeri et al. (1996)provide details of a case study which employsthis kind of model.

    Further issues and approaches in disease clustering studies

    There are several further extensions and varia-tions relating to disease clustering studies. Someof the most significant issues that have been con-sidered are listed below with relevant references.

    Local indicators of association (e.g. Anselin,1995; Getis, 1992) are general exploratory meth-ods for spatial data which may have potentialapplication in the preliminary phase of diseaseclustering studies. Models (rather than signifi-cance tests) that have relevance to spatio-tem-poral disease clustering investigation are dis-cussed by Bernardinelli et al. (1995b); Knorr-Held & Besag (1998). Edge effect considera-tions in disease clustering are discussed by

    Lawson et al. (1999b). Cressie (1996) discussesinference for extreme values in general withrelevance to cluster detection. Methods for in-corporating directional or scale effects in theeffects to be expected from putative sources ofhazard have also been developed (e.g. Lawson& Viel, 1995; Waller & Turnbull, 1993). Jacquez(1996a) also discusses how uncertainy in thelocation of suspected sources of hazard may behandled.

    Environmental assessment and monitoring

    There are many known or suspected environ-mental factors that influence health (e.g. nu-clear contamination, chemical toxins, air pol-lution, climatic or vegetation conditions thatmay influence distribution of disease vectorsetc.). The quantity and quality of data on theenvironment is constantly increasing, particu-larly that from remote sensing platforms. Sta-tistical models of environmental processes(e.g. see Piegorsch et al., 1998) allow spatial orspatio-temporal prediction of environmentalfactors which may then be used in conjunctionwith studies concerned with investigating dis-ease aetiology or establishing public healthintervention programmes (e.g. see Diggle &Richardson, 1993). The environmental process-es under study usually exhibit strong local spa-tial, temporal and exogenous variability whichneeds to be allowed for in prediction models.

    Environmental modelling is a very widefield and there is not space to discuss it in de-tail here. In general, remote sensing and imageprocessing techniques increasingly play a keyrole (e.g. see Besag et al., 1991; Datcu et al.,1998; Schroder et al., 1998; Stein et al., 1998b)as do advances in modelling of Markov Ran-dom Fields (e.g. see Aykroyd, 1998; Cressie &Davidson, 1998; Tjelmeland & Besag, 1998).Many of the models used in environmentalanalysis adopt a Bayesian approach (e.g. seeBesag & Green, 1993; Christakos & Li, 1998;Gaudard et al., 1999). In some cases there is aneed to particularly focus on the prediction ofthreshold values of a phenomena in which caseextreme value modelling (e.g. see Coles & Pow-ell, 1996) may be necessary. Many studies in-volve spatio-temporal data (e.g. Kyriakidis &Journel, 1999; Stein et al., 1998a; Wikle et al.,1998). Other related areas include spatial sam-pling considerations (e.g. see Cox et al., 1997)and a need for versatile exploratory methodsfor spatial and space-time environmental data(e.g. Cook et al., 1997).

  • SPATIAL STATISTICS 1093

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    One recent general development concern-ing spatial prediction models is worthy of par-ticular note here since it may have consider-able potential in relation to health studies. Themethodology of kriging in its many variousguises (e.g. see Cressie, 1993), provides a versa-tile prediction tool for many geostatisticalprocesses in space, or in space and time, andhas usefully been employed in the predictionof environmental processes. However, krigingis conventionally concerned with prediction ofGaussian spatial or spatio-temporal process(e.g. it can over smooth when distribution isnon-Gaussian). A significant recent develop-ment (Diggle et al., 1998) is the generalizationof this methodology to situations in which datais non-Gaussian. The essential idea embedslinear kriging methodology within a non-linearand more general distributional framework,analogous to the embedding of standard leastsquares regression within the framework ofgeneralized linear models (GLMs). A Bayesianapproach, implemented through MCMC meth-ods, is then used to fit the associated model.Diggle et al. (1998) provide details and applica-tions of the approach.

    Software in geographical/environmental epidemiology

    A recurring theme in this paper is the compu-tationally intensive nature of many of the sta-tistical methods discussed. In this section someof the key software environments that exist tosupport the use of these methods are brieflydiscussed.

    One computing environment which nowdominates in much of the literature concernedwith statistical methods in geographical epi-demiology (as in many other areas of statisticalanalysis) is the versatile statistical computinglanguage S-Plus (or the freely available publicdomain similar language R). A number of addon S-Plus packages particularly orientated tospatial applications are also available, in par-ticular S+Spatial and S+GeoStat. The former in-cludes several general purpose routines forspatial analysis, including point pattern analy-sis, some forms of spatial regression and sim-ple kriging; whilst the latter is orientated moreto geostatistical modelling. There are also anumber of relevant public domain S-PLus li-braries of functions supplied by third partiessuch as: SPLancs (point pattern analysis), geoS(geostatistical functions), Oswald (longitudinaldata analysis) and spatial (basic spatial statis-tics). Many other relevant Splus functions (or

    groups of functions) are also available on theInternet from many individual contributors.Some of the above functionality is also avail-able for R the public domain version of Splus.

    S-plus or R do not in themselves provide forMCMC methods. Functionality in this area isprovided by BUGS (Spiegelhalter et al., 1997)or, more recently, WinBUGs, both available inthe public domain. These packages are able toimplement many of the Bayesian models dis-cussed in earlier sections of this paper. A pub-lic domain link between BUGS and S-plus alsoexists known as CODA which enables resultsfrom BUGS to be easily transferred to S-plus forsubsequent analysis.

    S-Plus, R and BUGS provide no direct abili-ty to geographically visualize or map spatial re-sults (although a Geo BUGS add-on is forth-coming). For that purpose it is necessary to usethem in conjunction with a suitable Geograph-ical Information System (GIS). The most com-monly used packages in this regard in the healtharea are probably ARC/INFO and/or ARC/View(ESRI, 1996) and MapInfo (MapInfo Corpora-tion, 1994). S-Plus provides a link to Arc/Viewwhich allows results to be transferred andmapped relatively easily. More special purposecomputing packages for particular kinds ofanalysis relevant in geographical or environ-mental epidemiology include: ECOSSE (geosta-tistical/environmental modelling); DisMapWin(epidemiological mixture models Schalttmannet al., 1996); MLWin (multi-level modelling);and Stat!, Gamma or SatScan (each relating tovarious types of spatial disease clustering testsand associated analysis). Full details of the var-ious packages or libraries mentioned in thissection are easily available through the Inter-net and those references are not repeated here.

    Some closing remarks on statisticalmethods in geographical and environmental epidemiology

    Given the rich variety of methods discussed inthis paper, it is clear that the state of the artin statistical methods appropriate to certainproblems in spatial epidemiology containssome powerful, versatile and useful tools. Re-search interest is strong and undoubtedly fur-ther developments and more sophisticatedtechniques will continue to develop. Many ofthe existing spatial methods and models arefairly widely known in the statistical communi-ty and some of them have been in use for sev-eral years. Spatial methods are less familiaramongst epidemiologists and public health

  • BAILEY, T. C.1094

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    specialists, but there are now a number of ac-cessible texts, both on general introductoryspatial analysis (e.g. Bailey & Gatrell, 1995), onmore advanced spatial statistics (e.g. Cressie,1993), on general statistical modelling (e.g.Venables & Ripley, 1994) and on MCMC meth-ods (e.g. Gilks et al., 1996). These and similartexts, combined with the increasing amount ofpublished work more specifically related tospatial epidemiology (as referenced in this pa-per), means that relevant methods and accessto supporting software environments are nowbecoming better disseminated to health spe-cialists. Hopefully that situation will continueto improve and spatial methods and modelswill be increasingly used where appropriate.

    Reflection on the methods discussed in thepaper does however reveal some areas whichemerge as significantly under played amongstthe mainstream methods and models. Theseinclude: a greater requirement for methods ca-pable of handling mixtures of data types (e.g.at different levels of aggregation, or mixtures ofcase event and aggregated data, or a combina-tion of information from remotely sensed im-agery with that from more conventional healthor demographic data sources); methods de-signed to better address the problem of ecolog-ical bias of various types; methods to betterhandle the spatio-temporal considerations pre-sent in many studies; the fact that there is cur-rently a relative absence of methods designedto handle multivariate spatial responses; andalso that the current spatial methods place aheavy reliance on the Euclidean distance met-ric combined with relatively crude topographicassumptions despite the potentially powerfulfunctionality that GIS can now provide in thatarea.

    Such areas provide a rich agenda for furtherstudy and some related exciting and diffcultchallenges, particularly in the area of the mul-tivariate study of groups of related diseases inspatial epidemiology and in incorporatingmore realistic and sophisticated measures ofspatial proximity and spatial structure intomodels which appropriately exploit the de-tailed geographical information which is nowavailable through GIS and remotely sensing.

    In concluding this review of spatial statisti-cal methods in health it is also appropriate tocomment briefly on some wider and less statis-tical issues. The first point to acknowledge isthat geographical epidemiology is epidemiolo-gy first and foremost and not statistics. Valu-able spatial epidemiology does not necessarilyfollow from the use of better and more sophis-ticated statistical methods. Clearly, the ulti-

    mate benefits of all the statistical effort in geo-graphical epidemiology also depends cruciallyon appropriate and well-founded epidemio-logical considerations combined with access todata at an appropriate level of detail and of suf-fcient quality to address the issues under con-sideration. The value of clear-cut, well designedgeographical epidemiological studies associat-ing disease with specific agents are not contro-versial. However, regardless of the sophistica-tion of the statistical models employed, gener-al geographical studies of widespread risk fac-tors usually come up with relatively low rela-tive risk estimates (resulting from low grade ex-posure, the diffculties of obtaining good expo-sure contrasts, and the problems of confound-ing effects). This can cause credibility prob-lems for the results and limit their implicationsfor public health response. For example, of thehundreds of disease clustering investigationsconducted there are only a few examples of re-al success in terms of substantive advancesin aetiological knowledge, or developments inpublic health (e.g. see Neutra, 1990).

    Ultimately, geographical/environmentalepidemiology needs to be evaluated in thesame way as any other public health screeningprograms (e.g. see Axelson, 1999, Neutra, 1999).In some such applications which involve thespatial statistical analysis of already existingdata, or that arise from routine collection sys-tems, the screening is relatively cheap. How-ever, this has to be balanced against the impli-cations of false positive findings, the potentialfor effective intervention in cases of true posi-tive findings, and the costs of the follow up andmore focussed studies that will inevitably benecessary in such cases. Such considerationsare important and whilst they do not mitigateagainst the development and use of improvedstatistical methodology, they do emphasizethat the value of such methods has to be viewedin the context of a wider and ultimately morecomplex set of public health and epidemiolog-ical concerns.

  • SPATIAL STATISTICS 1095

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    References

    ALEXANDER, F. E. & BOYLE, P., 1996. Methods for In-vestigating Localised Clusters of Disease. IARCScientific Publication 135. Lyon: InternationalAgency for Research on Cancer.

    ALEXANDER, F. E.; McKINNEY, P. A.; CARTWRIGHT,R. A. & RICKETTS, T. J., 1991. Methods of map-ping and identifying small clusters of diseasewith applications to geographical epidemiology.Geographical Analysis, 23:156-173.

    ANDERSON, N. H. & TITTERINGTON, D. M., 1997.Some methods for investigating spatial cluster-ing, with epidemiological applications. Journal ofthe Royal Statistical Society, Series A, 160:87-105.

    ANSELIN, L., 1995. Local indicators of spatial associ-ation LISA. Geographical Analysis, 27:93-115.

    ASSUNO, R. M. & REIS, E. A., 1999. A new proposalto adjust Morans I for population density. Statis-tics in Medicine, 18:2147-2162.

    AXELSON, O., 1999. The character and the publichealth implications of ecological analyses. In:Disease Mapping and Risk Assessment for PublicHealth (A. Lawson, A. Biggeri, D. Bhning, E.Lesaffre, J.-F. Viel & R. Bertollini, ed.), pp. 301-309,New York: Wiley and Sons.

    AYKROYD, R. G., 1998. Bayesian estimation for homo-geneous and inhomogeneous Gaussian randomfields. IEEE Transactions on Pattern Analysis andMachine Intelligence, 20:533-539.

    BAILEY, T. C. & GATRELL, A. C., 1995. Interactive Spa-tial Data Analysis. Essex: Longman.

    BAKER, R. D., 1996. Testing for space-time clusters ofunknown size. Journal of Applied Statistics, 23:543-554.

    BERNARDINELLI, L.; CLAYTON, D. & MONTOMOLI,C., 1995a. Bayesian Estimates of disease maps:How important are priors? Statistics in Medicine,14:2411-2431.

    BERNARDINELLI, L.; CLAYTON, D.; PASCUTTO, C.;MONTMOLI, C. & GHISLANDI, M., 1995b.Bayesian analysis of space-time variation in dis-ease risk. Statistics in Medicine, 14:2433-2443.

    BERNARDINELLI, L. & MONTOMOLI, M., 1992. Em-pirical Bayes versus fully Bayesian analysis of ge-ographical variation in disease risk. Statistics inMedicine, 11:983-1007.

    BERNARDINELLI, L.; PASCUTTO, C.; BEST, N. G. &GILKS, W. R., 1997. Disease mapping with errorsin covariates. Statistics in Medicine, 16:741-752.

    BESAG, J. & GREEN, P. J., 1993. Spatial statistics andbayesian computation. Journal of the Royal Sta-tistical Society, Series B, 55:25-37.

    BESAG, J. & KOOPERBERG, C., 1995. On conditionaland intrinsic autoregressions. Biometrika, 82:733-746.

    BESAG, J. & NEWELL, J., 1991. The detection of clus-ters in rare diseases. Journal of the Royal Statisti-cal Society, Series A, 154:143-155.

    BESAG, J.; YORK, J. & MOLLI, A., 1991. Bayesian im-age restoration with two applications in spatialstatistics. Annals of the Institute of Statistics andMathematics, 43:1-20.

    BEST, N.; ICKSTADT, K. & WOLPERT, R., 1998. SpatialPoisson Regression for Health and Exposure DataMeasured at Disparate Spatial Scales. Technical

    Report. London: Department of Epidemiologyand Public Health, Imperial College School ofMedicine.

    BIGGERI, A.; BARBONE, F.; LAGAZIO, C.; BOVENZI,M. & STANTA, G., 1996. Air pollution and lungcancer in Trieste: Spatial analysis of risk as a func-tion of distance from sources. EnvironmentalHealth Perspectives, 104:750-754.

    BITHELL, J., 1990. An application of density estima-tion to geograhical epidemiology. Statistics inMedicine, 9:691-701.

    BITHELL, J., 1995. The choice of test for detectingraised disease risk near a point source. Statisticsin Medicine, 14:2309-2322.

    BROOKS, S., 1998. Markov Chain Monte Carlo and itsapplication. Journal of the Royal Statistical Soci-ety, Series D, 47:69-100.

    BRUNSDON, C.; FOTHERINGHAM, S. & CHARLTON,M., 1998. Geographically weighted regression Modelling spatial non-stationarity. Journal of theRoyal Statistical Society, Series D, 47:431-444.

    CARRAT, F. & VALLERON, A., 1992. Epidemiologicalmapping using the kriging method: Applicationto an influenza-like illness epidemic in France.American Journal of Epidemiology, 135:1293-1300.

    CHRISTAKOS, G. & LI, X. Y., 1998. Bayesian maximumentropy analysis and mapping: A farewell to krig-ing estimators? Mathematical Geology, 30:435-462.

    CHRISTIANSEN, C. & MORRIS, C., 1997. Hierarchicalpoisson regression modelling. Journal of theAmerican Statistical Association, 92:618-632.

    CISLAGHI, C.; BIGGERI, A.; BRAGA, M.; LAGAZIO, C.& MARCHI, M., 1995. Exploratory tools for dis-ease mapping in geographic epidemiology. Sta-tistics in Medicine, 14:2363-2381.

    CLAYTON, D. & BERNARDINELLI, L., 1996. Bayesianmethods for mapping disease risk. In: Geographicand Environmental Epidemiology: Methods forSmall area Studies (P. Elliot, J. Cuzick, D. English,& R.Stern, ed.), pp. 205-220, Oxford: Oxford Uni-versity Press.

    CLAYTON, D.; BERNARDINELLI, L. & MONTOMOLI,C., 1993. Spatial correlation and ecological analy-sis. International Journal of Epidemiology, 22:1193-1201.

    CLAYTON, D. & KALDOR, J., 1987. Empirical Bayesestimates of age-standardised relative risks foruse in disease mapping. Biometrics, 43:671-681.

    COLES, S. G. & POWELL, E. A., 1996. Bayesian meth-ods in extreme value modelling: A review andnew developments. International Statistical Re-view, 64:119-136.

    CONGDON, P., 1998. A multilevel model for infanthealth outcomes: Maternal risk factors and geo-graphic variation. Journal of the Royal StatisticalSociety, Series D, 47:159-182.

    COOK, D.; SYMANZIK, J.; MAJURE, J. J. & CRESSIE,N., 1997. Dynamic graphics in a GIS: More exam-ples using linked software. Computers and Geo-sciences, 23:371-385.

    COX, D. D.; COX, L. H. & ENSOR, K. B., 1997. Spatialsampling and the environment: Some issues anddirections. Environmental and Ecological Statis-tics, 4:219-233.

  • BAILEY, T. C.1096

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    CRESSIE, N., 1993. Statistics for Spatial Data. Chich-ester: Wiley.

    CRESSIE, N., 1996. Bayesian and constrained infer-ence for extremes in epidemiology. In: AmericanStatistical Association, Joint Meetings 1995, Epi-demiology Proceedings, pp. 11-17. Alexandria:American Statistical Association.

    CRESSIE, N. & DAVIDSON, J. L., 1998. Image analysiswith partially ordered markov models. Computa-tional Statistics and Data Analysis, 29:1-26.

    CUZICK, J. & EDWARDS, R., 1990. Spatial clusteringfor inhomogeneous populations. Journal of RoyalStatistical Society, Series B, 52:73-104.

    DATCU, M.; SEIDEL, K. & WALESSA, M., 1998. Spatialinformation retrieval from remote-sensing im-ages Part 1: Information theoretical perspective.IEEE Transactions on Geoscience and RemoteSensing, 36:1431-1445.

    DEVINE, O. & LOUIS, T., 1994. A constrained empiri-cal Bayes estimator for incidence rates in areaswith small populations. Statistics in Medicine,13:1119-1133.

    DIGGLE, P. J. & CHETWYND, A., 1991. Second-orderanalysis of spatial clustering for inhomogeneouspopulations. Biometrics, 47:1155-1163.

    DIGGLE, P. J. & ELLIOTT, P., 1995. Disease risk nearpoint sources: Statistical issues for analyses usingindividual or spatially aggregated data. Journal ofEpidemiology and Community Health, 49:S20-S27.

    DIGGLE, P. J.; LIANG, K.-Y. & ZEGER, S. L., 1994.Analysis of Longitudinal Data. Oxford: ClarendonPress.

    DIGGLE, P. J.; MORRIS, S.; ELLIOTT, P. & SHADDICK,G., 1997. Regression modelling of disease risk inrelation to point sources. Journal of the Royal Sta-tistical Society, Series A, 160:491-505.

    DIGGLE, P. J. & RICHARDSON, S., 1993. Epidemio-logic Studies of Industrial Pollutants An intro-duction. International Statistical Review, 61:203-206.

    DIGGLE, P. J. & ROWLINGSON, B. S., 1994. A condi-tional approach to point process modelling of el-evated risk. Journal of the Royal Statistical Soci-ety, Series A, 157:433-440.

    DIGGLE, P. J.; TAWN, J. A. & MOYEED, R. A., 1998.Model-based geostatistics. Journal of the RoyalStatistical Society, Series C, 47:299-326.

    ELLIOTT, P.; CUZICK, J.; ENGLISH, D. & STERN, R.,1996. Geographic and Environmental Epidemiol-ogy: Methods for Small Area Studies. Oxford: Ox-ford University Press.

    GATRELL, A. & LOYTONEN, M., 1998. GIS and Healthin Europe. London: Taylor and Francis.

    GAUDARD, M.; KARSON, M.; LINDER, E. & SINHA,D., 1999. Bayesian spatial prediction. Environ-mental and Ecological Statistics, 6:147-171.

    GETIS, A., 1992. Spatial interaction and spatial auto-correlation: A cross product approach. Environ-ment and Planning A, 23:1269-1277.

    GHOSH, M.; NATARAJAN, K.; STROUD, T. W. F., & Car-lin, B. P., 1998. Generalized linear models forsmall-area estimation. Journal of the AmericanStatistical Association, 93:273-282.

    GILKS, W.; CLAYTON, D.; SPIEGELHALTER, D.; BEST,N.; McNEIL, A.; SHAPLES, L. & KIRBY, A., 1993.Modelling complexity: Applications of Gibbs

    sampling in medicine. Journal of the Royal Statis-tical Society, Series B, 55:39-102.

    GILKS, W. R.; RICHARDSON, S. & SPIEGELHALTER,D., 1996. Markov Chain Monte Carlo in Practice.London: Chapman and Hall.

    GOLDSTEIN, H., 1995. Multilevel Statistical Models.London: Edward Arnold.

    GREEN, P. & SILVERMAN, B., 1994. Non ParametricRegression and Generalised Linear Models. Lon-don: Chapman and Hall.

    HAINING, R.; WISE, S. & MA, J. S., 1998. Exploratoryspatial data analysis in a geographic informationsystem environment. Journal of the Royal Statisti-cal Society, Series D, 47:457-469.

    HALLORAN, E. & GREENHOUSE, J., 1997. Statisticsand Epidemiology: Environment and Health. NewYork: Springer-Verlag.

    HASTIE, T. & TIBSHIRANI, R., 1990. Generalised Addi-tive Models. London: Chapman and Hall.

    HILLS, M. & ALEXANDER, F., 1989. Statistical meth-ods used in assessing the risk of disease near asource of possible environmental pollution: A re-view. Journal of the Royal Statistical Society, SeriesA, 152:353-363.

    JACQUEZ, G. M., 1996a. Disease cluster statistics forimprecise space-time locations. Statistics in Med-icine, 15:873-885.

    JACQUEZ, G. M., 1996b. A k-nearest neighbour testfor space-time interaction. Statistics in Medicine,15:1935-1949.

    KELSALL, J. E. & DIGGLE, P. J., 1995. Non parametricestimation of spatial variation in relative risk. Sta-tistics in Medicine, 14:2335-2342.

    KELSALL, J. E. & DIGGLE, P. J., 1998. Spatial variationin risk of disease: A nonparametric binary regres-sion approach. Journal of the Royal Statistical So-ciety, Series C, 47:559-573.

    KNORR-HELD, L. & BESAG, J., 1998. Modelling riskfrom a disease in time and space. Statistics inMedicine, 17:2045-2060.

    KNOX, E. G.,1964. The detection of space-time inter-actions. Applied Statistics, 13:25-29.

    KORIE, S.; CLARK, S. J.; PERRY, J. N.; MUGGLE-STONE, M. A.; BARTLETT, P. W.; MARSHALL, E. J.P. & MANN, J. A., 1998. Analyzing maps of disper-sal around a single focus. Environmental andEcological Statistics, 5:317-344.

    KULLDORF, M., 1997. A spatial scan statistic. Com-munications in Statistics Theory and Methods,26:1481-1496.

    KULLDORF, M.; FEUER, E. J.; MILLER, B. A. & FREED-MAN, L. S., 1997. Breast cancer clusters in thenortheast United States: A geographic analysis.American Journal of Epidemiology, 146:161-170.

    KULLDORF, M. & HJALMARS, U., 1999. The Knoxmethod and other tests for space-time interac-tion. Biometrics, 55:544-552.

    KULLDORF, M. & NAGARWALLA, N., 1995. Spatialdisease clusters: Detection and inference. Statis-tics in Medicine, 14:799-810.

    KYRIAKIDIS, P. C. & JOURNEL, A. G., 1999. Geostatis-tical space-time models: A review. MathematicalGeology, 31:651-684.

    LANGFORD, I. & LEWIS, T., 1998. Outliers in multi-level models. Journal of the Royal Statistical Soci-ety, Series A, 101:121-160.

  • SPATIAL STATISTICS 1097

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    LAWSON, A., 1993. On the analysis of mortality eventsaround a prespecified fixed point. Journal of theRoyal Statistical Society, Series A, 156:363-377.

    LAWSON, A., 1995. Markov Chain Monte Carlo meth-ods for putative polution source problems in en-vironmental epidemiology. Statistics in Medicine,14:2473-2486.

    LAWSON, A.; BIGGERI, A.; BHNING, D.; LESAFFRE,E.; VIEL, J.-F. & BERTOLLINI R., 1999a. DiseaseMapping and Risk Assessment for Public Health.Chichester: Wiley.

    LAWSON, A.; BIGGERI, A. & DREASSA, E., 1999b. Edgeeffects in disease mapping. In: Disease Mappingand Risk Assessment for Public Health (A. Lawson,A. Biggeri, D. Bhning, E.Lesaffre, J.-F. Viel & R.Bertollini ed.), pp. 85-97, Chichester: Wiley.

    LAWSON, A. & CLARK, A., 1999. Markov Chain MonteCarlo methods for clustering in case event andcount data in spatial epidemiology. In: Statisticsand Epidemiology: Environment and Health (E.Halloran & J. Greenhouse, ed.), pp. 151-163, NewYork: Springer-Verlag.

    LAWSON, A. & VIEL, J.-F., 1995. Tests for directionalspace-time interaction in epidemiological stud-ies. Statistics in Medicine, 14:2383-2392.

    LAWSON, A. & WALLER, L., 1996. A review of pointpattern methods for spatial modelling of eventsaround sources of pollution. Environmetrics,7:471-488.

    MANTEL, N., 1967. The detection of disease cluster-ing and a generalised regression approach. Can-cer Research, 27:209-220.

    MARTUZZI, M. & ELLIOTT, P., 1996. Empirical Bayesestimation of small area prevalence in non-rareconditions. Statistics in Medicine, 15:1867-1873.

    MOLLIE, A., 1995. Bayesian mapping of disease. In:Markov Chain Monte Carlo in Practice (W. R.Gilks, S. Richardson & D. J. Spiegelhalter, ed.), pp.359-379, London: Chapman & Hall.

    MULLER, H. G.; STADTMULLER, U. & TABNAK, F.,1997. Spatial smoothing of geographically aggre-gated data, with application to the constructionof incidence maps. Journal of the American Sta-tistical Association, 92:61-71.

    NEUTRA, R. R., 1990. Counterpoint from a clusterbuster. American Journal of Epidemiology, 132(Sup.):1-8.

    NEUTRA, R. R., 1999. Computer geographic analysis:A commentray on its use and misuse in publichealth. In: Disease Mapping and Risk Assessmentfor Public Health (A. Lawson, A. Biggeri, D. Bhn-ing, E. Lesaffre, J.-F. Viel, & R. Bertollini, ed.), pp.311-319, Chichester: Wiley.

    ODEN, N., 1995. Adjusting Morans I for populationdensity. Statistics in Medicine, 14:17-26.

    OLSEN, S.; MARTUZZI, M. & ELLIOTT, P., 1996. Clus-ter analysis and disease mapping Why, when,how? A step by step guide. BMJ, 313:863-865.

    OPPENSHAW, S., 1991. A new approach to the detec-tion and validation of cancer clusters: A review ofopportunities, progress and problems. In: Statis-tics in Medicine (F. Dunstan & J. Pickles, ed.), pp.49-63, Oxford: Clarendon Press.

    PIEGORSCH, W. W.; SMITH, E. P.; EDWARDS, D. &SMITH, R. L., 1998. Statistical advances in envi-ronmental science. Statistical Science, 13:186-208.

    PLUMMER, M. & CLAYTON, D., 1996. Estimation ofpopulation exposure in ecological studies. Jour-nal of the Royal Statistical Society, Series B, 58:113-126.

    POTTHOFF, R. F. & WHITTINGHILL, M., 1966. Testingfor Homogeneity. Biometrika, 53:167-190.

    PRENTICE, R. & SHEPPARD, L., 1995. Aggregate datastudies of disease risk factors. Biometrika, 82:113-125.

    RICHARDSON, S. & GREEN, P., 1997. On Bayesiananalysis of mixtures with an unknown number ofcomponents. Journal of the Royal Statistical Soci-ety, Series B, 59:731-792.

    RICHARDSON, S.; GUIHENNEUC, C. & LASSERRE, V.,1992. Spatial linear models with autocorrelatederror structure. Journal of the Royal Statistical So-ciety, Series D, 41:539-557.

    RUSHTON, G.; KRISHNAMURTHY, R.; KRISHNA-MURTI, D.; LOLONIS, P. & SONG, H., 1996. Thespatial relationship between infant mortality andbirth defect rates in a US city. Statistics in Medi-cine, 15:1907-1919.

    SCHALTTMANN, P. & BHNING, D., 1993. Mixturemodels and disease mapping. Statistics in Medi-cine, 12:1943-1950.

    SCHALTTMANN, P.; BHNING, D. & DIETZ, E., 1996.Covariate adjusted mixture models with the pro-gram DismapWin. Statistics in Medicine, 15:919-929.

    SCHRODER, M.; REHRAUER, H.; SEIDEL, K. &DATCU, M., 1998. Spatial information retrievalfrom remote sensing images Part II: Gibbs-Markov random fields. IEEE Transations on Geo-science and Remote Sensing, 36:1446-1455.

    SPIEGELHALTER, D.; THOMAS, A.; BEST, N. & GILKS,W., 1997. BUGS: Bayesian Inference Using GibbsSampling. Cambridge: Medical Research CouncilBiostatistics Unit.

    SPIEGELHALTER, D., 1998. Bayesian graphical mod-elling: A case study in monitoring health out-comes. Journal of the Royal Statistical Society, Se-ries C, 47:115-133.

    SPSS INCORPORATION, 1997. SPSS for Windows. Sta-tistical Package for the Social Sciences. Release 8.0.Chicago: SPSS Inc.

    STEIN, A.; VanGROENIGEN, J. W.; JEGER, M. J. &HOOSBEEK, M. R., 1998a. Space-time statisticsfor environmental and agricultural related phe-nomena. Environmental and Ecological Statistics,5:155-172.

    STEIN, A.; BASTIAANSSEN, W. G. M.; DeBRUIN, S.;CRACKNELL, A. P.; CURRAN, P. J., FABBRI, A. G.;GORTE, B. G. H.; VanGROENIGEN, J. W.; Vander-MEER, F. D. & SALDANA, A., 1998b. Integratingspatial statistics and remote sensing. Internation-al Journal of Remote Sensing, 19:1793-1814.

    STONE, R., 1988. Investigations of excess environ-mental risks around putative sources: Statisticalproblems and a proposed test. Statistics in Medi-cine, 7:649-660.

    TANGO, T., 1995. A class of test for detecting generaland focussed clustering of rare diseases. Statis-tics in Medicine, 14:2323-2334.

    THOMSON, M. C.; CONNOR, S. J.; DALESSANDRO,U.; ROWLINGSON, B.; DIGGLE, P.; CRESSWELL,M. & GREENWOOD, B., 1999. Predicting malaria

  • BAILEY, T. C.1098

    Cad. Sade Pblica, Rio de Janeiro, 17(5):1083-1098, set-out, 2001

    infection in Gambian children from satellite dataand bed net use surveys: The importance of spa-tial correlation in the interpretation of results.American Journal of Tropical Medicine and Hy-giene, 61:2-8.

    TJELMELAND, H. & BESAG, J., 1998. Markov randomfields with higher-order interactions. Scandina-vian Journal of Statistics, 25:415-433.

    TURNBULL, B.; IWANO, E.; BURNETT, W.; HOWE, H.& CLARK, L., 1990. Monitoring for clusters of dis-ease: Application to Leukaemia incidence in up-state New York. American Journal of Epidemiolo-gy, 132:136-143.

    UNWIN, A. & UNWIN, D., 1998. Exploratory spatialdata analysis with local statistics. Journal of theRoyal Statistical Society, Series D, 47:415-421.

    VENABLES, W. N. & RIPLEY, B. D., 1994. Modern Ap-plied Statistics with S-Plus. Berlin: Springer-Verlag.

    WALLER, L.; CARLIN, B.; XIA, H. & GELFAND, A.,1997. Hierarchical spatio-temporal mapping ofdisease rates. Journal of the American StatisticalAssociation, 92:607-617.

    WALLER, L. & LAWSON, A., 1995. The power of fo-cussed tests to detect disease clustering. Statisticsin Medicine, 14:2291-2308.

    WALLER, L. & TURNBULL, B., 1993. The effect of scaleon tests for disease clustering. Statistics in Medi-cine, 12:1869-1884.

    WALTER, S., 1993a. Assessing spatial pattern in dis-ease rates. Statistics in Medicine, 12:1885-1894.

    WALTER, S., 1993b. Visual and statistical assessmentof spatial clustering in mapped data. Statistics inMedicine, 12:1275-1291.

    WALTER, S., 1994. A simple test for spatial pattern inregional health data. Statistics in Medicine, 13:1037-1044.

    WEBSTER, R.; OLIVER, M.; MUIR, K. & MANN, J.,1994. Kriging the local risk of a rare disease froma register of diagnoses. Geographical Analysis,26:168-185.

    WEIR, I. S. & PETTITT, A. N., 1999. Spatial modellingfor binary data using a hidden conditional au-toregressive Gaussian process: A multivariate ex-tension of the probit model. Statistics and Com-puting, 9:77-86.

    WHITTEMORE, A.; FRIEND, N.; BROWN, B. & HOLLY,E., 1987. A test to detect clusters of disease. Bio-metrika, 74:631-635.

    WIKLE, C. K.; BERLINER, L. M. & CRESSIE, N., 1998.Hierarchical Bayesian space-time models. Envi-ronmental and Ecological Statistics, 5:117-154.

    WILHELM, A. & STECK, R., 1998. Exploring spatialdata by using interactive graphics and local sta-tistics. Journal of the Royal Statistical Society, Se-ries D, 47:423-430.

    WOLPERT, R. L. & ICKSTADT, K., 1998. Poisson/gam-ma random field models for spatial statistics. Bio-metrika, 85:251-267.

    XIA, H.; CARLIN, B. P. & WALLER, L. A., 1997. Hierar-chical models for mapping Ohio lung cancerrates. Environmetrics, 8:107-120.

    YASUI, Y. & LELE, S., 1997. A regression method forspatial disease rates: An estimating function ap-proach. Journal of the American Statistical Associ-ation, 92:21-32.