spatial databases: lecture 7 spatial statistics dt249-4 dt228-4 semester 2 2010 pat browne

121
Lecture 7 Lecture 7 Spatial Statistics Spatial Statistics DT249-4 DT228-4 DT249-4 DT228-4 Semester Semester 2 2010 2 2010 Pat Browne Pat Browne

Upload: dennis-douglas

Post on 11-Jan-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Databases: Lecture 7Spatial Databases: Lecture 7Spatial StatisticsSpatial Statistics

DT249-4 DT228-4 DT249-4 DT228-4 Semester 2 Semester 2 20102010

Pat BrownePat Browne

Page 2: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

OutlineOutline

Statistical spatial dataStatistical spatial data Review of standard statistical conceptsReview of standard statistical concepts Unique features of spatial data StatisticsUnique features of spatial data Statistics Spatial AutocorrelationSpatial Autocorrelation Spatial regression (SR) and geographical spatial Spatial regression (SR) and geographical spatial

regression (GWR)regression (GWR) Data miningData mining

Association rulesAssociation rules Co-locationCo-location

Page 3: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Statistical Spatial DataStatistical Spatial Data

In this lecture we consider In this lecture we consider spatial data spatial data contains an attribute e.g. house prices, contains an attribute e.g. house prices, occurrences of disease, occurrences of occurrences of disease, occurrences of accidents, crop yield, poverty patterns, accidents, crop yield, poverty patterns, crime rates, etc. Earlier parts of the course crime rates, etc. Earlier parts of the course covered the representation of physical covered the representation of physical objects such as houses, counties, and objects such as houses, counties, and roads. These objects were arranged by roads. These objects were arranged by theme. Here we consider theme. Here we consider attributesattributes of of those objects e.g. the population of an ED.those objects e.g. the population of an ED.

Page 4: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

DefinitionsDefinitions

Spatial statisticsSpatial statistics is the statistical study of spatial is the statistical study of spatial data that varies over discrete space e.g. crime data that varies over discrete space e.g. crime rates broken down by neighbourhood. Spatial rates broken down by neighbourhood. Spatial statistical models can be used for estimation, statistical models can be used for estimation, description, and prediction based on probability description, and prediction based on probability theory (not covered).theory (not covered).

GeostatisticsGeostatistics is the statistical study of spatial is the statistical study of spatial data sets that vary over data sets that vary over continuouscontinuous space e.g. space e.g. soil quality. Interpolation and prediction soil quality. Interpolation and prediction techniques include Kringing & Veriograms (not techniques include Kringing & Veriograms (not covered on this course).covered on this course).

Page 5: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: Independent Standard statistical concepts: Independent EventsEvents

Two events A and B are Two events A and B are statistically independenstatistically independent t if the chance that they both happen if the chance that they both happen simultaneously is the product of the chances simultaneously is the product of the chances that each occurs individually. that each occurs individually. We say that two We say that two events, A and B, are independent if the events, A and B, are independent if the probability that they both occur is equal to the probability that they both occur is equal to the product of the probabilities of the two individual product of the probabilities of the two individual events, i.e. events, i.e.

P(AP(AB) = P(A) B) = P(A) P(B) P(B) This is equivalent to saying that learning that This is equivalent to saying that learning that

one event occurs does not give any information one event occurs does not give any information about whether the other event occurred too.about whether the other event occurred too.

Page 6: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: Identically Standard statistical concepts: Identically DistributedDistributed

Two events A and B areTwo events A and B are identically identically distributeddistributed if P(A) =P(B) i.e. they have the if P(A) =P(B) i.e. they have the same probability distribution. same probability distribution.

Page 7: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: Identically Standard statistical concepts: Identically Distributed variableDistributed variable

Identically Distributed variable Same probability distributionsIdentically Distributed variable Same probability distributions

Page 8: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: Standard statistical concepts: i.i.di.i.d

A collection of two or more random A collection of two or more random variables {X1, X2, … , } is variables {X1, X2, … , } is independent independent andand identically distributed identically distributed if the variables if the variables have the have the same probability distributionsame probability distribution, and , and are are independentindependent. .

Page 9: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: ExamplesStandard statistical concepts: Examples

Example i.i.d: All other things being equal, a Example i.i.d: All other things being equal, a sequence of dice rolls is i.i.d.sequence of dice rolls is i.i.d.

Example of non i.i.d: bird nesting patterns in Example of non i.i.d: bird nesting patterns in wetlands, where the independent variables are wetlands, where the independent variables are distance from water, length of grass, depth of distance from water, length of grass, depth of water and the dependent variable would be the water and the dependent variable would be the presence of a nest site. A uniform distribution of presence of a nest site. A uniform distribution of these variables on a map would indicate an these variables on a map would indicate an even distribution, however a more complex even distribution, however a more complex emerges where the variables are spatially emerges where the variables are spatially dependent.dependent.

Page 10: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: Standard statistical concepts: CorrelationCorrelation

CorrelationCorrelation: A correlation is a single number that : A correlation is a single number that describes the degree of relationship between two describes the degree of relationship between two normally distributed variables. The variables are not normally distributed variables. The variables are not designated as dependent or independent. The value of a designated as dependent or independent. The value of a correlation coefficient can vary from minus one to plus correlation coefficient can vary from minus one to plus one. A minus one indicates a perfect negative one. A minus one indicates a perfect negative correlation, while a plus one indicates a perfect positive correlation, while a plus one indicates a perfect positive correlation. A correlation of zero means there is no correlation. A correlation of zero means there is no relationship between the two variables. When there is a relationship between the two variables. When there is a negative correlation between two variables, as the value negative correlation between two variables, as the value of one variable increases, the value of the other variable of one variable increases, the value of the other variable decreases, and vice versa.decreases, and vice versa.

Page 11: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: Variance and Standard statistical concepts: Variance and covariancecovariance

A measure of variation equal to the mean of the A measure of variation equal to the mean of the squared deviations from the mean. The variance squared deviations from the mean. The variance is a measure of the amount of variation within is a measure of the amount of variation within the values of that variable, taking account of all the values of that variable, taking account of all possible values and their probabilities or possible values and their probabilities or weightings.weightings.

Covariance is measure of the variation between Covariance is measure of the variation between variables, say X and Y. The range of covariance variables, say X and Y. The range of covariance values is unrestricted. However, if the X and Y values is unrestricted. However, if the X and Y variables are first standardized, then covariance variables are first standardized, then covariance is the same as correlation and the range of is the same as correlation and the range of covariance (correlation) values is from –1 to +1.covariance (correlation) values is from –1 to +1.

Page 12: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: CorrelationStandard statistical concepts: Correlation

Correlation is a measure of the degree of linear Correlation is a measure of the degree of linear relationship between two variables, say X and Y. While relationship between two variables, say X and Y. While in regression the emphasis is on predicting one variable in regression the emphasis is on predicting one variable from the other, in correlation the emphasis is on the from the other, in correlation the emphasis is on the degreedegree to which a linear model may describe the to which a linear model may describe the relationship between two variables. In regression the relationship between two variables. In regression the interest is directional, one variable is predicted and the interest is directional, one variable is predicted and the other is the predictor; in correlation the interest is non-other is the predictor; in correlation the interest is non-directional, the relationship is the critical aspect. The directional, the relationship is the critical aspect. The correlation coefficient may take on any value between correlation coefficient may take on any value between plus and minus one (-1 < r < 1). plus and minus one (-1 < r < 1).

Page 13: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: Standard statistical concepts: RegressionRegression

RegressionRegression: takes a numerical dataset : takes a numerical dataset and develops a mathematical formula that and develops a mathematical formula that fits the data. The results can be used to fits the data. The results can be used to predict future behaviour. Works well with predict future behaviour. Works well with continuous quantitative data like weight, continuous quantitative data like weight, speed or age. Not good for categorical speed or age. Not good for categorical data where order is not significant, like data where order is not significant, like colour, name, gender, nest/no nest. colour, name, gender, nest/no nest. Example: plotting snowfall against height Example: plotting snowfall against height above sea level.above sea level.

Page 14: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: Standard statistical concepts: RegressionRegression

Y = A + BX; The response variable is y, and x is the continuous explanatory variable. Parameter A is the intercept. Parameter B is the slope. The difference between each data point and the value predicted by the line (the model) us called a residual

Page 15: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Standard statistical concepts: Null Standard statistical concepts: Null

hypothesishypothesis The null hypothesis, H0, represents a theory that has The null hypothesis, H0, represents a theory that has

been put forward, either because it is believed to be true, been put forward, either because it is believed to be true, but has not been proved. For example, in a clinical trial but has not been proved. For example, in a clinical trial of a new drug, the null hypothesis might be that the new of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug H0: drug is no better, on average, than the current drug H0: there is no difference between the two drugs on average.there is no difference between the two drugs on average.

In general, the null hypothesis for spatial data is that In general, the null hypothesis for spatial data is that either the features themselves or of the values either the features themselves or of the values associated with those features are randomly distributed associated with those features are randomly distributed (e.g. no spatial pattern or bias). (e.g. no spatial pattern or bias).

Page 16: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Relation of i.i.d., regression, and correlation with Relation of i.i.d., regression, and correlation with

spatial phenomena.spatial phenomena. The first law of geography according to Waldo Tobler is The first law of geography according to Waldo Tobler is

"Everything is related to everything else, but near things "Everything is related to everything else, but near things are more related than distant things." In statistical terms are more related than distant things." In statistical terms this is called this is called autocorrelationautocorrelation where the traditional i.i.d. where the traditional i.i.d. assumption is assumption is notnot valid for spatially valid for spatially dependent variables dependent variables (e.g. temperature or crime rate) we need special (e.g. temperature or crime rate) we need special techniques to handle this type of data (e.g. Moran’s I). techniques to handle this type of data (e.g. Moran’s I). These techniques usually involve including a These techniques usually involve including a weight weight matrix matrix which contains location information. The non-i.i.d. which contains location information. The non-i.i.d. nature of spatially dependent variables carries over into nature of spatially dependent variables carries over into regression and correlation which require spatial weights regression and correlation which require spatial weights

Page 17: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Relation of i.i.d., regression, and Relation of i.i.d., regression, and correlation with spatial databasecorrelation with spatial database

Spatial databases are used for spatial data mining, Spatial databases are used for spatial data mining, which includes statistical techniques and more which includes statistical techniques and more specialised DM techniques such as association rules.. In specialised DM techniques such as association rules.. In this case the data mining algorithms need to have a this case the data mining algorithms need to have a spatial context. We must explicitly include location spatial context. We must explicitly include location information where previously with the i.i.d. assumption it information where previously with the i.i.d. assumption it was not required Typical generic data mining activities was not required Typical generic data mining activities such as clustering, regression, classification, association such as clustering, regression, classification, association rules, all need a spatial context. Spatial DM is used in a rules, all need a spatial context. Spatial DM is used in a broad range scientific disciplines, such as analysis of broad range scientific disciplines, such as analysis of crime, modelling land prices, poverty mapping, crime, modelling land prices, poverty mapping, epidemiology, air pollution and health, natural and epidemiology, air pollution and health, natural and environmental sciences, etc. The analyst must be aware environmental sciences, etc. The analyst must be aware the special techniques required for SDM. the special techniques required for SDM.

Page 18: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Relation of i.i.d., regression, and Relation of i.i.d., regression, and correlation with spatial databasecorrelation with spatial database

Spatial databases are also used for pure Spatial databases are also used for pure statistical research (e.g. environmental statistical research (e.g. environmental studies). Those variables that are spatially studies). Those variables that are spatially dependent (e.g. the PH of the soil) need to dependent (e.g. the PH of the soil) need to be clearly identified and special be clearly identified and special techniques applied to take into account techniques applied to take into account their spatial bias.their spatial bias.

Page 19: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Unique features of spatial data StatisticsUnique features of spatial data Statistics

General Statistics assumes the samples General Statistics assumes the samples are independently generated, which is are independently generated, which is may not the case with spatial dependent may not the case with spatial dependent data.data.

Like things tend to cluster together.Like things tend to cluster together. Change tends to be gradual over space.Change tends to be gradual over space.

Page 20: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Unique features of spatial data Statistics Unique features of spatial data Statistics Spatial dependent valuesSpatial dependent values

The previous maps illustrate two important The previous maps illustrate two important features of spatial data:features of spatial data:

Spatial Autocorrelation (Spatial Autocorrelation (not independentnot independent)) The probability that they both occur is equal to the The probability that they both occur is equal to the

product of the probabilities of the two individual product of the probabilities of the two individual events, i.e. events, i.e.

P(AP(AB) = P(A) B) = P(A) P(B) P(B) Spatial data is not Spatial data is not identicallyidentically distributed. distributed.

Two events A and B areTwo events A and B are identically distributed identically distributed if P(A) if P(A) =P(B) i.e. they have the same probability distribution.=P(B) i.e. they have the same probability distribution.

Page 21: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Unique features of spatial data Statistics Unique features of spatial data Statistics

Autocorrelation & Spatial HeterogeneityAutocorrelation & Spatial Heterogeneity.. Spatial autocorrelationSpatial autocorrelation is detected when the value is detected when the value

of a variable in a location is correlated with values of of a variable in a location is correlated with values of the same variable in the neighbourhood (can be the same variable in the neighbourhood (can be measured with Moran I).measured with Moran I).

Spatial heterogeneitySpatial heterogeneity is characterized by different is characterized by different values or behaviours through space which can be values or behaviours through space which can be measured by measured by Local Indicators of Spatial AssociationLocal Indicators of Spatial Association (LISA). Characterizes (LISA). Characterizes the non-stationarity of most the non-stationarity of most geographic processes, meaning that global geographic processes, meaning that global parameters may parameters may notnot accurately reflect the process accurately reflect the process occurring at a particular location.occurring at a particular location.

Page 22: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial AutocorrelationSpatial Autocorrelation11. .

Autocorrelation: Autocorrelation: degreedegree of correlation between of correlation between neighbouring values.neighbouring values.

Spatial dependency: neighbouring values are Spatial dependency: neighbouring values are similar (i.e. positive spatial autocorrelation).similar (i.e. positive spatial autocorrelation).

Moran’s I enable assessment of the degree to Moran’s I enable assessment of the degree to which values tend to be similar to neighbouring which values tend to be similar to neighbouring values. We can observe how autocorrelation values. We can observe how autocorrelation varies with distance.varies with distance.

The Moran scatter plot relates individual values The Moran scatter plot relates individual values to weighted averages of neighbouring values. to weighted averages of neighbouring values. The slope of a regression line fitted to the points The slope of a regression line fitted to the points in the scatter plot gives the global Moran’s I. in the scatter plot gives the global Moran’s I.

Page 23: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Autocorrelation: Moran’s ISpatial Autocorrelation: Moran’s I

Moran’s I measures the average correlation between the value of a variable at one location and the value at nearby locations. The essential idea is to specify pairs of locations that influence each other along with the relative intensity of interaction. Moran’s I provides a global view of spatial autocorrelation correlation. We will look at details later

The range of the Moran's I statistic depends on the spatial weight matrix.

When Moran's I is scaled by its bounds the statistic is restricted to the range ±1

Page 24: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Autocorrelation: Case StudySpatial Autocorrelation: Case Study

Nest locationsDistance to open waterVegetation durability

Water depth

Page 25: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial AutocorrelationSpatial AutocorrelationClassical Statistical Assumptions Classical Statistical Assumptions

(i.i.d) do not hold for spatially (i.i.d) do not hold for spatially dependent datadependent data

Page 26: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Unique features of spatial data Statistics Unique features of spatial data Statistics First Law of GeographyFirst Law of Geography

First law of geography [Tobler]:First law of geography [Tobler]: Everything is related to everything, but nearby Everything is related to everything, but nearby

things are more related than distant things.things are more related than distant things. People with similar backgrounds tend to live People with similar backgrounds tend to live

in the same areain the same area Economies of nearby regions tend to be Economies of nearby regions tend to be

similarsimilar Changes in temperature occur gradually over Changes in temperature occur gradually over

space (and time) (equator V poles).space (and time) (equator V poles).

Page 27: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Autocorrelation:Spatial Autocorrelation: Moran’s I - Moran’s I - exampleexample

Page 28: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Moran’s I - exampleMoran’s I - example

•Pixel value set in (b) and (c ) are same but their Moran Is are different.•Q? Which dataset between (b) and (c ) has higher spatial autocorrelation?

Figure 7.5, pp. 190

Page 29: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial AutocorrelationSpatial Autocorrelation : Moran : Moran Scatterplot MapScatterplot Map

00

00 zz

WZWZ

QQ3 = 3 = HLHLQQ3 = 3 = HLHLQQ2= 2= LLLLQQ2= 2= LLLL

QQ1= 1= HHHHQQ1= 1= HHHHQQ4 = 4 = LHLHQQ4 = 4 = LHLH

São Paulo

Old-aged population

Page 30: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Heterogeneity.Spatial Heterogeneity. Spatial heterogeneity; Spatial heterogeneity; Is there such a thing as an Is there such a thing as an

average place with respect to some property (e.g. average place with respect to some property (e.g. vegetation). is difficult to imagine any subset of the vegetation). is difficult to imagine any subset of the Earth’s surface being a representative sample of the Earth’s surface being a representative sample of the whole. GWR (later) addresses the localness of whole. GWR (later) addresses the localness of spatial data.spatial data.

Page 31: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Neigbourhood relationship Neigbourhood relationship contiguity matrixcontiguity matrix

Page 32: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial autocorrelationSpatial autocorrelation

Spatial autocorrelation is determined both by Spatial autocorrelation is determined both by similarities in position, and by similarities in similarities in position, and by similarities in attributesattributes Sampling intervalSampling interval Self-similaritySelf-similarity

Auto = self Auto = self Correlation = degree of relatedness Correlation = degree of relatedness

correspondencecorrespondence

Page 33: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial autocorrelationSpatial autocorrelation

In the following slide, each diagram contains 32 In the following slide, each diagram contains 32 white cell and 32 blue cells = 64 cells.white cell and 32 blue cells = 64 cells.

BB = Blue beside Blue BB = Blue beside Blue BW = Blue beside WhiteBW = Blue beside White WW = White beside White.WW = White beside White.

Page 34: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial autocorrelationSpatial autocorrelationNegative

Dispersed

Spatial Independence

Spatial ClusteringPositive

Page 35: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial regression (SR)Spatial regression (SR)

Spatial regression (SR) is a global spatial modeling Spatial regression (SR) is a global spatial modeling technique in which spatial autocorrelation among the technique in which spatial autocorrelation among the regression parameters are taken into account. SR is regression parameters are taken into account. SR is usually performed for spatial data obtained from spatial usually performed for spatial data obtained from spatial zones or areas. The basic aim in SR modeling is to zones or areas. The basic aim in SR modeling is to establish the relationship between a dependent variable establish the relationship between a dependent variable measured over a spatial zone and other attributes of the measured over a spatial zone and other attributes of the spatial zone, for a given study area, where the spatial spatial zone, for a given study area, where the spatial zones are the subset of the study area. While SR is zones are the subset of the study area. While SR is known to be a modeling method in spatial data analysis known to be a modeling method in spatial data analysis literature in spatial data-mining literature it is considered literature in spatial data-mining literature it is considered to be a classification techniqueto be a classification technique

Page 36: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Geographically weighted Geographically weighted regression (GWR)regression (GWR)

Geographically weighted regression (GWR) is a powerful Geographically weighted regression (GWR) is a powerful exploratory method in spatial data analysis. It serves for exploratory method in spatial data analysis. It serves for detecting local variations in spatial behavior and detecting local variations in spatial behavior and understanding local details, which may be masked by understanding local details, which may be masked by global regression models. Unlike SR, where regression global regression models. Unlike SR, where regression coefficient for each independent variable and the coefficient for each independent variable and the intercept are obtained for the whole study region, in intercept are obtained for the whole study region, in GWR, regression coefficients are computed for GWR, regression coefficients are computed for every every spatial zonespatial zone. Therefore, the regression coefficients can . Therefore, the regression coefficients can be mapped and the appropriateness of stationarity be mapped and the appropriateness of stationarity assumption in the conventional regression analyses can assumption in the conventional regression analyses can be checked.be checked.

Page 37: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Geographically weighted Geographically weighted regression (GWR)regression (GWR)

GWR is an effective technique for exploring GWR is an effective technique for exploring spatial nonstationarity, which is characterized by spatial nonstationarity, which is characterized by changes in relationships across the study region changes in relationships across the study region leading to varying relations between dependent leading to varying relations between dependent and independent variables. Hence there is a and independent variables. Hence there is a need for better understanding of the spatial need for better understanding of the spatial processes has emerged local modeling processes has emerged local modeling techniques. GWR has been implemented in techniques. GWR has been implemented in various disciplines such as the natural, various disciplines such as the natural, environmental, social and earth sciences.environmental, social and earth sciences.

Page 38: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Exploring spatial patterning in Exploring spatial patterning in spatial data valuesspatial data values11. .

Two issuesTwo issues 1. How do variables change from place to 1. How do variables change from place to

place? Zone similar to neighbours?place? Zone similar to neighbours? 2. How are variables related. How does the 2. How are variables related. How does the

relationship between rainfall and altitude vary relationship between rainfall and altitude vary from place to place.from place to place.

Page 39: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Local StatisticsLocal Statistics1 1 moving windowmoving window

Geographical Weights

• Binary: Rook or queen neighbours

• Distance based

• Boundary or perimeter based.

• Weights can be row-normalized using the number of adjacent cells

Page 40: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Local Univariate measuresLocal Univariate measures1 1 moving windowmoving window

Standard univariate can be computed for a Standard univariate can be computed for a moving window, supplying the degree and moving window, supplying the degree and nature of variation in summary statistics nature of variation in summary statistics across a region of interest (e.g. we could across a region of interest (e.g. we could compute the standard deviation for several compute the standard deviation for several windows and assess the degree of windows and assess the degree of variability from place to place. variability from place to place.

Geographical weighting schemes can be Geographical weighting schemes can be used for the calculation of local statistics.used for the calculation of local statistics.

Page 41: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Local spatial autocorrelationLocal spatial autocorrelation11

Global statistics such as Moran’s I can mask Global statistics such as Moran’s I can mask local spatial structure. The local Moran can be local spatial structure. The local Moran can be used to measure local spatial autocorrelation. used to measure local spatial autocorrelation. Only if there is little or no variation in the local Only if there is little or no variation in the local observations do the global observations provide observations do the global observations provide any reliable information on the local areas within any reliable information on the local areas within the study area. As the spatial variation of the the study area. As the spatial variation of the local observations increases, the reliability of the local observations increases, the reliability of the global observation as representative of local global observation as representative of local conditions decreases.conditions decreases.

Page 42: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Local spatial autocorrelationLocal spatial autocorrelation11

The weights could be based on rook, queen, distance, perimeter and normalized by number of neighbours ( slide 28)

Page 43: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Local spatial autocorrelationLocal spatial autocorrelation

Page 44: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial autocorrelationSpatial autocorrelationNegative

Dispersed

Spatial Independence

Spatial ClusteringPositive

Map A and Map B each represent a distinct geographic region. The number in the regions (cells) represents the number of leukaemia cases in that region. These two sets of values have the same mean and standard deviation. In contrast, Moran’s I statistic for the data on Map A is -0.269, and 0.041 for the data on Map B. They differ because values in the regions have a different spatial arrangement. The contiguity (or weight) matrix used by the Moran I calculation will be different and hence we get a different result.A visual inspection of both maps would suggests that A has negative (-Moran) , the neighbouring values tend to be dissimilar, thus no clustering of like values is suggested. B has little autocorrelation because it’s Moran is near zero.

Page 45: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial autocorrelationSpatial autocorrelationNegative

Dispersed

Spatial Independence

Spatial ClusteringPositive

The grids A and B represent two different spatial resolutions over the same area. Grid A contains 16 cells and Grid B contains 64 cells. The strength of spatial autocorrelation is often a function of scale or spatial resolution, as illustrated in above using black and white cells. High negative spatial autocorrelation is exhibited in A since each cell has a different colour from its neighbouring cells. In B each cell can be subdivided into four half-size cells, assuming the cell’s homogeneity. Then, the strength of spatial autocorrelation among the black and white cells increases, while maintainingthe same cell arrangement. his illustrates that spatial autocorrelation varies with the study scale The strength of spatial autocorrelation is a function of scale, increasing from 4-by-4 case to the 8-by-8 case.

Page 46: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Original dataOriginal data

4545 4444 4444

4343 4242 3939

3838 3232 3434

Values, differences from mean, rook Values, differences from mean, rook standardized weight sum = 1standardized weight sum = 1

yyii zzii wwijij wwijijzzii

4545 4.8894.889 0.0000.000 0.0000.000

4343 2.8892.889 0.2500.250 0.7220.722

3838 -2.111-2.111 0.0000.000 0.0000.000

4444 3.8893.889 0.25000.2500 0.9720.972

4242 1.8891.889 0.0000.000 0.0000.000

3232 -8.111-8.111 0.2500.250 -2.028-2.028

4444 3.8893.889 0.0000.000 0.0000.000

3939 -1.111-1.111 0.250.25 -0.278-0.278

3434 -6.111-6.111 0.0000.000 0.0000.000

sumsum 1.001.00 -0.611-0.611

Calculate local Moran I for central cell (42) where

z i= (xi – x )

Mean 40.111

Variance = 21.861

Ii = (1.889/21.861)*(-0.661)= -0.053

Has low negative value, neighbouring values tend to be dissimilar.

Page 47: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne
Page 48: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Global Moran’s I = 0.665

Local I, large positive values in rural areas, more patchy around Belfast

Page 49: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial RegressionSpatial Regression11

The assumption of i.i.d. underlying The assumption of i.i.d. underlying ordinary least squares regression rarely ordinary least squares regression rarely holds for spatial data. There are several holds for spatial data. There are several techniques that handle the spatial case;techniques that handle the spatial case; Moving window regressionMoving window regression Geographic Weighted Regression (GWR)Geographic Weighted Regression (GWR)

We will look at GWRWe will look at GWR

Page 50: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Geographic Weighted Regression (GWR) Geographic Weighted Regression (GWR) 11

The steps are;The steps are;1.1. Go to a locationGo to a location2.2. Conduct regression using the raw data and Conduct regression using the raw data and

a geographic weighting scheme.a geographic weighting scheme.3.3. Move to next location go back to stage 2 Move to next location go back to stage 2

until all locations have been visited.until all locations have been visited. The output is a set of regression The output is a set of regression

coefficients (e.g. slope and intercept) at coefficients (e.g. slope and intercept) at each location each location

Page 51: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Coords of observations, variables. distance from first Coords of observations, variables. distance from first observation, and geographic weightsobservation, and geographic weights

pointpoint xx yy Var 1Var 1 Var 2Var 2 distdist Geo wGeo w

11 2525 4545 1212 66 00 11

22 2525 4444 3434 5252 11 0.9950.995

33 2121 4848 3232 4141 55 0.88250.8825

44 2727 5252 1212 2525 88 0.72610.7261

55 1616 3131 1111 2222 1616 0.2780.278

66 4242 3535 1414 99 2020 0.08890.0889

77 99 6565 5656 4343 2626 0.0340.034

88 2929 7676 7575 6767 3232 0.0060.006

99 6161 6666 4343 3232 4242 0.00020.0002

Page 52: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Location of points for previous table

Page 53: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Regression using previous table and locations, the geographic weighting pulls the line towards the points with larger weights

Page 54: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Summary of spatial statsSummary of spatial stats

Moran’s I measures the average correlation between the value of a variable at one location and the value at nearby locations.

Local Moran statistic measures spatial dependence on a Local Moran statistic measures spatial dependence on a local basis, allowing the researcher to see its variation local basis, allowing the researcher to see its variation over space, and by Geographicallyover space, and by Geographically

Geographically Weighted Regression allows the Geographically Weighted Regression allows the parameters of a regression analysis to vary spatially. parameters of a regression analysis to vary spatially. GWR helps in detecting local variations in spatial GWR helps in detecting local variations in spatial behavior and understanding local details, which may be behavior and understanding local details, which may be masked by global regression models. GWR, regression masked by global regression models. GWR, regression coefficients are computed for every spatial zone.coefficients are computed for every spatial zone.

Page 55: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

© Oxford University Press, 2010. All rights reserved. Lloyd: Spatial Data Analysis

Page 56: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Two scatter plots and fitted lines for different aggregations of same value© Oxford University Press, 2010. All rights reserved. Lloyd: Spatial Data Analysis

Page 57: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Moran’s IMoran’s I

A contiguity matrix may represent a neighborhood relationship defined using adjacency or Euclidean distance. There are several definitions adjacency include a four-neighbourhood or an eight-neighborhood. Given a gridded spatial framework, a four-neighborhood assumes that a pair of locations influence each other if they share an edge (rook). An eight-neighborhood assumes that a pair of locations influence each other if they share either an edge or a vertex (queen).

Page 58: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Moran’s IMoran’s I

• Using a normalised weight matrix the values of I range from -1 to 1.

• Value = 1 : Perfect positive correlation

• Value = 0 : No autocorrelation

• Value = -1: Perfect negative correlation

• A Moran’s I may appear low (say 0.17) but is statistically significant pattern is clustered since index is above 0.

Page 59: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Moran’s IMoran’s I

• Global Moran’s I

• What is the extent of clustering in the total area?

• Is this clustering significantly different from a random spatial distribution?

• Local Moran’s I

• Do local clusters (high-high or low-low) or local spatial outliers (high-low or low-high) exist?

• Are these local clusters and spatial outliers statistically significant?

Page 60: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Moran’s I: A measure of spatial Moran’s I: A measure of spatial autocorrelationautocorrelation

Given Given sampled over n sampled over n locations. Moran I is defined as locations. Moran I is defined as

Where Where

and W is a normalized contiguity matrix.and W is a normalized contiguity matrix.

nxxx ,...1

t

t

zz

zWzI

x,...,xxxz n1

Fig. 7.5, pp. 190

Page 61: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial autocorrelationSpatial autocorrelationNegative

Dispersed

Spatial Independence

Spatial ClusteringPositive

The grids A and B represent two different spatial resolutions over the same area. Grid A contains 16 cells and Grid B contains 64 cells. The strength of spatial autocorrelation is often a function of scale or spatial resolution, as illustrated in above using black and white cells. High negative spatial autocorrelation is exhibited in A since each cell has a different colour from its neighbouring cells. In B each cell can be subdivided into four half-size cells, assuming the cell’s homogeneity. Then, the strength of spatial autocorrelation among the black and white cells increases, while maintainingthe same cell arrangement. his illustrates that spatial autocorrelation varies with the study scale The strength of spatial autocorrelation is a function of scale, increasing from 4-by-4 case to the 8-by-8 case.

Page 62: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Second Law of GeographySecond Law of Geography11

Second law of geography: Spatial heterogeneity Second law of geography: Spatial heterogeneity [Goodchild][Goodchild]

Spatial heterogeneity describes geographic variation Spatial heterogeneity describes geographic variation in the constants or parameters of relationships in the constants or parameters of relationships

When it is present, the outcome of an analysis When it is present, the outcome of an analysis depends on the area over which the analysis is made.depends on the area over which the analysis is made.

Spatial heterogeneity depends on the spatial Spatial heterogeneity depends on the spatial resolution.resolution.

Global model might be inconsistent with respect to a Global model might be inconsistent with respect to a regional model(s).regional model(s).

Page 63: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Second Law of GeographySecond Law of Geography

Spatial heterogeneitySpatial heterogeneity definitionsdefinitions: : quantitative information quantitative information characterizing the characterizing the

ground spatial structure ground spatial structure spatial variance distributionspatial variance distribution of the variable of the variable

considered, within the coarse sample considered, within the coarse sample resolution (e.g. pixelresolution (e.g. pixel or grid) or grid)

The The patterningpatterning or or patchinesspatchiness in important in important landscape properties such as vegetation landscape properties such as vegetation cover.cover.

Page 64: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Second Law of GeographySecond Law of Geography11

Spatial heterogeneity has been quantified from Spatial heterogeneity has been quantified from remote sensing images by using two basic remote sensing images by using two basic approaches:approaches:

(a) the direct image approach, where straight (a) the direct image approach, where straight reflectance or reflectance indices of remote reflectance or reflectance indices of remote sensing images are used to quantify spatial sensing images are used to quantify spatial heterogeneity, using the original pixel size of the heterogeneity, using the original pixel size of the image image

(b) the cartographic or patch mosaic approach, (b) the cartographic or patch mosaic approach, where the image is subdivided into where the image is subdivided into homogeneous mapping units through homogeneous mapping units through classification.classification.

Page 65: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Second Law of GeographySecond Law of Geography11

Suppose there is a relationship between number of AIDS Suppose there is a relationship between number of AIDS cases and number of people living in an area cases and number of people living in an area

The form of this relationship will vary spatially The form of this relationship will vary spatially in some areas the number of cases per capitain some areas the number of cases per capita22 will be higher will be higher

than in others than in others we could map the constant of proportionalitywe could map the constant of proportionality33

Spatial heterogeneity describes this geographic variation Spatial heterogeneity describes this geographic variation in the constants or parameters of relationships . When it in the constants or parameters of relationships . When it is present, the outcome of an analysis depends on the is present, the outcome of an analysis depends on the area over which the analysis is made. Often this area is area over which the analysis is made. Often this area is arbitrarily determined by a map boundary or political arbitrarily determined by a map boundary or political jurisdictionjurisdiction

Page 66: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Second Law of GeographySecond Law of Geography

Second law of geography [Goodchild]Second law of geography [Goodchild] Spatial heterogeneitySpatial heterogeneity

Global model often inconsistent with Global model often inconsistent with regional models (e.g. the average does regional models (e.g. the average does not hold anywhere).not hold anywhere).

Page 67: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

How to decide the weight wij ?

1) Binary wij, also called absolute adjacency. Covers the general case answering the question is a value in a region similar or different to its neighbours.

wij = 1 if two geographic entities are adjacent; otherwise, wij = 0. Choice of adjacency definition queens(8) or rooks(4).

The weight indicates the spatial interaction between entities.

Page 68: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

How to decide the weight wij ?

2) The distance between geographic entities. Often the inverse distance is used, further objects get less weight, near object get more weight e.g. centre of epidemic.

wij = f(dist(i,j)), dist(i,j) is the distance between i and j.

3) The length of common boundary for area entities. Policing borders, smaller borders less weight.

wij = f(leng(i,j)), leng(i,j) is the length of common boundary between i and j.

The weight indicates the spatial interaction between entities.

Page 69: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

How to decide the weight wij ?1

The choice of weights should ultimately be driven by a rationale for including those areas as neighbors that have a spatial effect on a given location. This rationale can be derived from theory or be the result of using ESDA to experiment with different weights and connectivity orders. Since weights matrices are used to create spatial lags that average neighboring values, the choice of a weights matrix will determine which neighboring values will be averaged. For instance, since rook weights will usually have fewer neighbors than queen weights, on average, each neighboring observation has more influence.

Page 70: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

How to decide the weight wij ?1

The question of which weights to choose is more pertinent in the context of modeling than ESDA since modeling is based on substantive notions of spatial effects while ESDA prioritizes the rejection of spatial randomness. Therefore, if there are no substantive reasons to guide the choice of weights in ESDA, using a weights file with as few neighbors as possible (such as rook) makes sense. Especially with irregular areal units (as opposed to grids), the difference between rook and queen weights is often minimal. However, it is advisable to test how sensitive your results are to your weights specifications by comparing multiple weights matrices.

Page 71: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Outlier DetectionSpatial Outlier Detection

Global outliers are observations which Global outliers are observations which appear inconsistent with the remainder of appear inconsistent with the remainder of that data set.that data set.

Global outliers deviate so much from other Global outliers deviate so much from other observations that it observations that it maymay be possible that be possible that they were generated by a different they were generated by a different mechanism.mechanism.

Spatial outliers are observations that Spatial outliers are observations that appear inconsistent with their neighbours.appear inconsistent with their neighbours.

Page 72: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Outlier DetectionSpatial Outlier Detection

Detecting spatial outliers has important Detecting spatial outliers has important applications in transportation, ecology, applications in transportation, ecology, public safety, public health, climatology public safety, public health, climatology and location based services.and location based services.

Geographic objects have a spatial Geographic objects have a spatial (location, shape, metric & topological (location, shape, metric & topological properties) & non-spatial component properties) & non-spatial component (house owner, sensor id., soil type).(house owner, sensor id., soil type).

Page 73: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Outlier DetectionSpatial Outlier Detection

Spatial neighbourhoods may be defined using Spatial neighbourhoods may be defined using spatial attributesspatial attributes & & spatial relationsspatial relations..

Comparisons between spatially referenced Comparisons between spatially referenced objects can be based on non-spatial attributes.objects can be based on non-spatial attributes.

A spatial outlier is a spatially referenced object A spatial outlier is a spatially referenced object whose non-spatial attribute values differ from whose non-spatial attribute values differ from those of other spatially referenced objects in its those of other spatially referenced objects in its spatial neighbourhood.spatial neighbourhood.

Page 74: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Data for Outlier detectionData for Outlier detection

In diagram on left G,P,S,Q show a big change in attribute for a small change in location. The right hand diagram shows a normal distribution (corresponds to attribute axis in left diagram)

Page 75: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Outlier DetectionSpatial Outlier Detection

The upper left & lower The upper left & lower right quadrants of right quadrants of figure 7.17 indicate a figure 7.17 indicate a spatial association of spatial association of dissimilar values; low dissimilar values; low values surrounded by values surrounded by high value neighbours high value neighbours (P & Q) and high (P & Q) and high values surrounded by values surrounded by low values (S).low values (S).

Page 76: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Outlier DetectionSpatial Outlier Detection

MoranMoranoutlieroutlier is a point is a point

located in the upper located in the upper left or lower right left or lower right quadrant of a Moran quadrant of a Moran scatter plot.scatter plot.

Page 77: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Outlier DetectionSpatial Outlier Detection

MoranMoranoutlieroutlier is a point is a point

located in the upper located in the upper left or lower right left or lower right quadrant of a Moran quadrant of a Moran scatter plot.scatter plot.

00

00 zz

WZWZ

QQ3 = 3 = HLHLQQ3 = 3 = HLHLQQ2= 2= LLLLQQ2= 2= LLLL

QQ1= 1= HHHHQQ1= 1= HHHHQQ4 = 4 = LHLHQQ4 = 4 = LHLH

DbCb

values in a given location

Page 78: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Model EvaluationModel Evaluation

Consider the two-class classification problem Consider the two-class classification problem ‘nest’ or ‘no-nest’. The four possible outcomes ‘nest’ or ‘no-nest’. The four possible outcomes (or predictions) are shown on the next slide. The (or predictions) are shown on the next slide. The desired predictions are:desired predictions are: 1) where the model says the should be a nest and 1) where the model says the should be a nest and

there is an actual nest (True Positive)there is an actual nest (True Positive) 2) where the model says there is no nest and there is 2) where the model says there is no nest and there is

no nest (True Negative)no nest (True Negative)

The other outcomes are not desirable and point The other outcomes are not desirable and point to a flaw in the model.to a flaw in the model.

Page 79: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Model EvaluationModel Evaluation

Page 80: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Statistical ModelsSpatial Statistical Models

A Point Process is a model for the spatial A Point Process is a model for the spatial distribution of points in a point pattern. distribution of points in a point pattern. Examples: the position of trees in a forest, Examples: the position of trees in a forest, location of petrol stations in a city.location of petrol stations in a city.

Actual real world point patterns can be Actual real world point patterns can be compared (using distance) with a compared (using distance) with a randomly distributed point pattern random. randomly distributed point pattern random.

Page 81: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Calculating the Local Moran I

Where the variance = 667.32 and mean = 55.82 from the entire population

Page 82: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Calculating the Local Moran I

Page 83: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Calculating the Global Moran I

Page 84: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Statistics versus Data MiningStatistics versus Data Mining Do we know the statistical properties of data? Is data Do we know the statistical properties of data? Is data

spatially clustered, dispersed, or random? spatially clustered, dispersed, or random? Data mining is strongly related to statistical analysis.Data mining is strongly related to statistical analysis. Data mining can be seen as a filter (exploratory data Data mining can be seen as a filter (exploratory data

analysis) before applying a rigorous statistical tool. analysis) before applying a rigorous statistical tool. Data mining generates hypothesis that are then Data mining generates hypothesis that are then

verified. verified. The filtering process does not guarantee The filtering process does not guarantee

completeness (wrong elimination or missing data).completeness (wrong elimination or missing data). "Drowning in Data yet Starving for

Knowledge"

Page 85: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Data Mining: OutlineData Mining: Outline

Background to data mining & spatial data mining.Background to data mining & spatial data mining. The data mining processThe data mining process Spatial autocorrelationSpatial autocorrelation i.e. the non independence of i.e. the non independence of

phenomena in a contiguous geographic area.phenomena in a contiguous geographic area. Spatial independenceSpatial independence Classical data mining concepts:Classical data mining concepts:

ClassificationClassification ClusteringClustering Association rulesAssociation rules

Spatial data mining, e.g. Co-location RulesSpatial data mining, e.g. Co-location Rules SummarySummary

Page 86: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Data MiningData Mining

Data mining is the process of discovering Data mining is the process of discovering interesting and potentially useful patterns of interesting and potentially useful patterns of information embedded in large information embedded in large databasesdatabases. .

Spatial data mining has the same goals as Spatial data mining has the same goals as conventional data mining but requires additional conventional data mining but requires additional techniques that are tailored to the spatial techniques that are tailored to the spatial domain.domain.

A key goal of spatial data mining is to A key goal of spatial data mining is to partially partially automate knowledge discoveryautomate knowledge discovery, i.e., search for , i.e., search for “nuggets” of information embedded in very large “nuggets” of information embedded in very large quantities of spatial data.quantities of spatial data.

Page 87: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Data MiningData Mining

Data mining lies at the intersection of database Data mining lies at the intersection of database management, statistics, machine learning and management, statistics, machine learning and artificial intelligence. DM provides semi-artificial intelligence. DM provides semi-automatic techniques for discovering automatic techniques for discovering unexpected patterns in very large data sets. unexpected patterns in very large data sets.

We must distinguish between operational We must distinguish between operational systems (e.g. bank account transactions) and systems (e.g. bank account transactions) and decision support systems (e.g. data mining)decision support systems (e.g. data mining)

Page 88: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Data MiningData Mining

Spatial DM can be characterised by Spatial DM can be characterised by Tobler’s first law of geography (near things Tobler’s first law of geography (near things tend to be more related than far things). tend to be more related than far things). Which means that the standard DM Which means that the standard DM assumptions that values are assumptions that values are independentlyindependently and and identicallyidentically distributed does not hold in distributed does not hold in spatially dependent data spatially dependent data (SDD). The term (SDD). The term spatial autocorrelationspatial autocorrelation captures this captures this property and needs to be included in DM property and needs to be included in DM techniques.techniques.

Page 89: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Data MiningData Mining

The important techniques in conventional The important techniques in conventional DM are association rules, clustering, DM are association rules, clustering, classification, and regression. These classification, and regression. These techniques need to be modified for spatial techniques need to be modified for spatial DM. Two approaches used when adapting DM. Two approaches used when adapting DM techniques to the spatial domain:DM techniques to the spatial domain: 1)Correct the underlying (iid) statistical model1)Correct the underlying (iid) statistical model 2)The o2)The objective functionbjective function11 which drives the which drives the

search can be modified to include a spatial search can be modified to include a spatial term.term.

Page 90: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Data MiningData Mining

Size of spatial data sets:Size of spatial data sets: NASA’s Earth Orbiting Satellites capture about a NASA’s Earth Orbiting Satellites capture about a

terabyte(10terabyte(101212) a day, YouTube 2008 = 6 terabytes.) a day, YouTube 2008 = 6 terabytes. Environmental agencies, utilities (e.g. ESB), Central Environmental agencies, utilities (e.g. ESB), Central

Statistics Office, government departments such as Statistics Office, government departments such as health/agriculture, and local authorities all have large health/agriculture, and local authorities all have large spatial data sets.spatial data sets.

It is very difficult to analyse such large data sets It is very difficult to analyse such large data sets manually.manually.

For examples see Chapter 7 from SDTFor examples see Chapter 7 from SDT

Page 91: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Data Mining: Sub-processesData Mining: Sub-processes

Data mining involves many sub-process:Data mining involves many sub-process: Data collection: usually data was collected as Data collection: usually data was collected as

part of the operational activities of an part of the operational activities of an organization, not for the data mining task. It is organization, not for the data mining task. It is unlikely that the data mining requirements were unlikely that the data mining requirements were considered during data collection.considered during data collection.

Data extraction/cleaning: hence data must be Data extraction/cleaning: hence data must be extracted & cleaned for the specific data mining extracted & cleaned for the specific data mining task.task.

Page 92: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Data Mining: Sub-processesData Mining: Sub-processes

Feature selection.Feature selection. Algorithm design.Algorithm design. Analysis of outputAnalysis of output Level of aggregation at which the data is Level of aggregation at which the data is

being analysed must be decided. Identical being analysed must be decided. Identical experiments at different levels of scale can experiments at different levels of scale can sometimes lead to contradictory results sometimes lead to contradictory results (e.g. the choice of basic spatial unit can (e.g. the choice of basic spatial unit can influence the results of a social survey).influence the results of a social survey).

Page 93: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Geographic Data mining processGeographic Data mining process

Close interaction between Domain Expert & Data-Mining Analyst

The output consists of hypotheses (data patterns) which can be verified with statistical tools and visualised using a GIS.

The analyst can interpret the patterns recommend appropriate actions

Page 94: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Statistics versus Data MiningStatistics versus Data Mining

Do we know the statistical properties of data? Is data Do we know the statistical properties of data? Is data spatially clustered, dispersed, or random? spatially clustered, dispersed, or random?

Data mining is strongly related to statistical analysis.Data mining is strongly related to statistical analysis. Data mining can be seen as a filter (exploratory data Data mining can be seen as a filter (exploratory data

analysis) before applying a rigorous statistical tool. analysis) before applying a rigorous statistical tool. Data mining generates hypothesis that are then Data mining generates hypothesis that are then

verified. verified. The filtering process does not guarantee The filtering process does not guarantee

completeness (wrong elimination or missing data).completeness (wrong elimination or missing data).

Page 95: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Unique features of spatial data Unique features of spatial data miningmining

The difference between classical & spatial The difference between classical & spatial data mining parallels the difference data mining parallels the difference between classical & spatial statistics.between classical & spatial statistics.

Statistics assumes the samples are Statistics assumes the samples are independently generated, which is independently generated, which is generally not the case with SDD.generally not the case with SDD.

Like things tend to cluster together.Like things tend to cluster together. Change tends to be gradual over space.Change tends to be gradual over space.

Page 96: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Non-Spatial Descriptive Data Non-Spatial Descriptive Data MiningMining

Descriptive analysisDescriptive analysis is an analysis that results in some description or is an analysis that results in some description or summarization of data. It characterizes the properties of the data by summarization of data. It characterizes the properties of the data by discovering patterns in the data, which would be difficult for the human discovering patterns in the data, which would be difficult for the human analyst to identify by eye or by using standards statistical techniques. analyst to identify by eye or by using standards statistical techniques. Description involves identifying rules or models that describe data. Both Description involves identifying rules or models that describe data. Both clustering and association rules are employed by supermarket chains. clustering and association rules are employed by supermarket chains.

ClusteringClustering (unsupervised learning) is a descriptive data mining technique. (unsupervised learning) is a descriptive data mining technique. Clustering is the task of assigning cases into groups of cases (clusters) so Clustering is the task of assigning cases into groups of cases (clusters) so that the cases within a group are similar to each other and are as different that the cases within a group are similar to each other and are as different as possible from the cases in other groups. Clustering can identify groups as possible from the cases in other groups. Clustering can identify groups of customers with similar buying patterns and this knowledge can be used to of customers with similar buying patterns and this knowledge can be used to help promote certain products. Clustering can help locate what are the help promote certain products. Clustering can help locate what are the crime ‘hot spots’ in a city.crime ‘hot spots’ in a city.

Association RulesAssociation Rules. Association rule discovery identifies the relationships . Association rule discovery identifies the relationships within data.within data. The rule can be expressed as a predicate in the form (IF The rule can be expressed as a predicate in the form (IF x x THEN THEN y y ). ARD can identify product lines that are bought together in a ). ARD can identify product lines that are bought together in a single shopping trip by many customers and this knowledge can be used to single shopping trip by many customers and this knowledge can be used to by a supermarket chain to help decide on the layout of the product lines.by a supermarket chain to help decide on the layout of the product lines.

Page 97: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Non-Spatial Predictive Non-Spatial Predictive Data MiningData Mining

Predictive DMPredictive DM results in some description or summarization of a results in some description or summarization of a sample of data which predicts the form of unobserved data. sample of data which predicts the form of unobserved data. Prediction involves building a set of rules or a model that will enable Prediction involves building a set of rules or a model that will enable unknown or future values of a variable to be predicted from known unknown or future values of a variable to be predicted from known values of another variable.values of another variable.

ClassificationClassification is a predictive data mining technique. Classification is is a predictive data mining technique. Classification is the task of finding a model that maps (classifies) each case into one the task of finding a model that maps (classifies) each case into one of several predefined classes. Classification is used in risk of several predefined classes. Classification is used in risk assessment in the insurance industry. assessment in the insurance industry.

RegressionRegression analysis is a predictive data mining technique that uses analysis is a predictive data mining technique that uses a model to predict a value. Regression can be used to predict sales a model to predict a value. Regression can be used to predict sales of new product lines based on advertising expenditure.of new product lines based on advertising expenditure.

Page 98: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Case StudyCase Study

Data from 1995 & 1996 concerning two wetlands Data from 1995 & 1996 concerning two wetlands on the shores of Lake Erie, USA.on the shores of Lake Erie, USA.

Using this information we want to predict the Using this information we want to predict the spatial distribution of marsh breeding bird called spatial distribution of marsh breeding bird called the red-winged black bird. Where will they build the red-winged black bird. Where will they build nests? What conditions do they favour?nests? What conditions do they favour?

A uniform grid (pixel=5 square metres) was A uniform grid (pixel=5 square metres) was superimposed on the wetland.superimposed on the wetland.

Seven attributes were recorded.Seven attributes were recorded. See linkSee link11 to Spatial Databases a Tour for details. to Spatial Databases a Tour for details.

Page 99: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Case StudyCase Study

Page 100: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Case StudyCase Study

Significance of three key variables Significance of three key variables established with statistical analysis.established with statistical analysis.

Vegetation durabilityVegetation durability Distance to open waterDistance to open water Water depthWater depth The spatial distribution is shown in 7.3.The spatial distribution is shown in 7.3.

Page 101: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Case StudyCase Study

Nest locations Distance to open water

Vegetation durabilityWater depth

Example showing different predictions: (a) the actual locations of nests; (b) pixels with actual nests; (c) locations predicted by one model; and (d) locations predicted by another model. Prediction (d) is spatially more accurate than (c).

Page 102: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Classical statistical assumptions do Classical statistical assumptions do notnot hold for spatially dependent hold for spatially dependent

datadata

Page 103: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Case StudyCase Study

The previous maps illustrate two important The previous maps illustrate two important features of spatial data:features of spatial data:

Spatial Autocorrelation (Spatial Autocorrelation (not independentnot independent)) Spatial data is not Spatial data is not identicallyidentically distributed. distributed. Two random variables are identically Two random variables are identically

distributed if and only if they have the distributed if and only if they have the same probability distribution.same probability distribution.

Page 104: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Why spatial DBs do not use Why spatial DBs do not use classical DMclassical DM

Rich data types (e.g., extended spatial objects)

Implicit spatial relationships among the variables,

Observations that are not independent, Spatial autocorrelation exists among the

features.

Page 105: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Classical Data MiningClassical Data Mining

Association rulesAssociation rules: Determination of interaction between attributes. : Determination of interaction between attributes. For example:For example:

X X Y: Y: ClassificationClassification: Estimation of the attribute of an entity in terms of : Estimation of the attribute of an entity in terms of

attribute values of another entity. Some applications are:attribute values of another entity. Some applications are: Predicting locations (shopping centers, habitat, crime zones)Predicting locations (shopping centers, habitat, crime zones) Thematic classification (satellite images)Thematic classification (satellite images)

ClusteringClustering: Unsupervised learning, where classes and the number : Unsupervised learning, where classes and the number of classes are unknown. Uses similarity criterion. Applications: of classes are unknown. Uses similarity criterion. Applications: Clustering pixels from a satellite image on the basis of their spectral Clustering pixels from a satellite image on the basis of their spectral signature, identifying hot spots in crime analysis and disease signature, identifying hot spots in crime analysis and disease tracking.tracking.

Regression:Regression: takes a numerical dataset and develops a takes a numerical dataset and develops a mathematical formula that fits the data. The results can be used to mathematical formula that fits the data. The results can be used to predict future behavior. Works well with continuous quantitative data predict future behavior. Works well with continuous quantitative data like weight, speed or age. Not good for categorical data where like weight, speed or age. Not good for categorical data where order is not significant, like color, name, gender, nest/no nest.order is not significant, like color, name, gender, nest/no nest.

Page 106: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Determining the Interaction among Determining the Interaction among AttributesAttributes

We wish to discovery relationships We wish to discovery relationships between attributes of a relation.between attributes of a relation.

is_close(house,beach) -> is_expensive(house)is_close(house,beach) -> is_expensive(house)

low(vegetationDurability) -> low(vegetationDurability) ->

high(stem density)high(stem density)

Associations & association rules are often Associations & association rules are often used to select subsets of features for more used to select subsets of features for more rigorous statistical correlation analysis.rigorous statistical correlation analysis.

Page 107: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

How does data mining differ from How does data mining differ from conventional methods of data analysis?conventional methods of data analysis? Using conventional data analysis the analyst formulates Using conventional data analysis the analyst formulates

and refines the hypothesis. This is known as hypothesis and refines the hypothesis. This is known as hypothesis verification, which is an approach to identifying patterns verification, which is an approach to identifying patterns in data where a human analyst formulates and refines in data where a human analyst formulates and refines the hypothesis. For example "Did the sales of cream the hypothesis. For example "Did the sales of cream increase when strawberries were available?"increase when strawberries were available?"

Using data mining the hypothesis is formulated and Using data mining the hypothesis is formulated and refined without human input. This approach is known as refined without human input. This approach is known as hypothesis generation is an approach to identifying hypothesis generation is an approach to identifying patterns in that data where the hypotheses are patterns in that data where the hypotheses are automatically formulated and refined. Knowledge automatically formulated and refined. Knowledge discovery is where the data mining tool formulates and discovery is where the data mining tool formulates and refines the hypothesis by identifying patterns in the data. refines the hypothesis by identifying patterns in the data. For example, "What are the factors that determine the For example, "What are the factors that determine the sales of cream?"sales of cream?"

Page 108: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Association ruleAssociation ruless

An association rule is a pattern that can An association rule is a pattern that can be expressed as a predicate in the form be expressed as a predicate in the form (IF (IF x x THEN THEN y y ), where ), where xx and and yy are are conditions (about conditions (about casescases), which state if ), which state if xx (the (the antecedentantecedent) occurs then, in most ) occurs then, in most cases, so will cases, so will y (y (thethe consequence) consequence). The . The antecedent many contain several antecedent many contain several conditions but the consequence conditions but the consequence usuallyusually contains only one term. contains only one term.

Page 109: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Association ruleAssociation ruless

Association rules need to be discovered. Rule Association rules need to be discovered. Rule discovery is data mining technique that identifies discovery is data mining technique that identifies relationships within data. In the non-spatial case relationships within data. In the non-spatial case rule discovery is usually employed to discover rule discovery is usually employed to discover relationships within transactions or between relationships within transactions or between transactions in operational data. The relative transactions in operational data. The relative frequency with which an antecedent appears in frequency with which an antecedent appears in a database is called its a database is called its supportsupport. High support is . High support is the frequency at which the relative frequency is the frequency at which the relative frequency is considered significant and is called the support considered significant and is called the support threshold (say 70%)threshold (say 70%)

Page 110: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Association ruleAssociation ruless

ExampleExample: Market basket analysis is form : Market basket analysis is form of association rule discovery that of association rule discovery that discovers relationships in the purchases discovers relationships in the purchases made by a customer during a single made by a customer during a single shopping trip. An shopping trip. An itemsetitemset in the context of in the context of market basket analysis is the set of items market basket analysis is the set of items found in a customer’s shopping basket.found in a customer’s shopping basket.

Page 111: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Association ruleAssociation ruless

Page 112: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Association rules & Spatial Association rules & Spatial DomainDomain

Differences with respect to spatial domain:Differences with respect to spatial domain:

1.1. The notion of The notion of transactiontransaction or case does not exist, since data or case does not exist, since data are immerse in a continuous space.The partition of the are immerse in a continuous space.The partition of the space may introduce errors with respect to overestimation space may introduce errors with respect to overestimation or sub-estimation confidences. The notion of transaction is or sub-estimation confidences. The notion of transaction is replaced by neighborhood. replaced by neighborhood.

2.2. The size of itemsets is less in the spatial domain. Thus, the The size of itemsets is less in the spatial domain. Thus, the cost of generating candidate is not a dominant factor. The cost of generating candidate is not a dominant factor. The enumeration of neighbours dominates the final enumeration of neighbours dominates the final computational cost. computational cost.

3.3. In most cases, the spatial items are discrete version of In most cases, the spatial items are discrete version of continuous variables. continuous variables.

Page 113: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Association RulesSpatial Association Rules

Table 7.5 shows examples of association Table 7.5 shows examples of association rules, support, and confidence that were rules, support, and confidence that were discovered in Darr 1995 wetland data.discovered in Darr 1995 wetland data.

Page 114: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Co-Location rulesCo-Location rules

Colocation rules attempt to generalise association rules to Colocation rules attempt to generalise association rules to point collection data sets that are indexed by space. The point collection data sets that are indexed by space. The colocation pattern discovery process finds frequently co-colocation pattern discovery process finds frequently co-located subsets of spatial event types given a map of their located subsets of spatial event types given a map of their locations, see Figure 7.12.locations, see Figure 7.12.

Page 115: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Co-location ExamplesCo-location Examples

(a) Illustration of Point Spatial Co-location Patterns. Shapes represent different spatial feature types. Spatial features in sets {`+,x} and {o,*} tend to be located together.

(b) Illustration of Line String Co-location Patterns. Highways and frontage roads1 are co-located , e.g., Hwy100 is near frontage road Normandale Road.

Page 116: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Answers: and

Two co-location patterns

Page 117: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Spatial Association RulesSpatial Association Rules

A spatial association rule is a rule indicating certain A spatial association rule is a rule indicating certain association relationship among a set of spatial and possibly association relationship among a set of spatial and possibly some non-spatial predicates.some non-spatial predicates.

Spatial association rules (SPAR) are defined in terms of Spatial association rules (SPAR) are defined in terms of spatial predicates rather than item.spatial predicates rather than item.

PP11 P P22 .. .. P Pnn Q Q11 .. .. Q Qmm

Where at least one of the terms (Where at least one of the terms (PP or or QQ) is a spatial ) is a spatial predicate.predicate.

is(x,country)is(x,country)touches(x,Mediterranean)touches(x,Mediterranean)

is(x,wine-exporter)is(x,wine-exporter)

Page 118: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Co-location V Association RulesCo-location V Association Rules

Transactions are disjoint while spatial co-location is not. Something must be done. Three main options 1. Divide the space into areas and treat them

as transactions 2. Choose a reference point pattern and treat

the neighbourhood of each of its points as a transaction

3. Treat all point patterns as equal

Page 119: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne

Co-location V Association RulesCo-location V Association Rules

Spatial Association Rules Mining (SARM) is similar to Spatial Association Rules Mining (SARM) is similar to the raster view in the sense that it tessellates a study the raster view in the sense that it tessellates a study region region S S into discrete groups based on spatial or aspatial into discrete groups based on spatial or aspatial predicates derived from concept hierarchies. For predicates derived from concept hierarchies. For instance, a instance, a spatial predicatespatial predicate close toclose to((α, βα, β) divides ) divides S S into two groups, locations close to into two groups, locations close to β β and those not. So, and those not. So, close toclose to((α, βα, β) can be either true or false depends on ) can be either true or false depends on αα’s ’s closeness to closeness to ββ. A spatial association rule is a rule that . A spatial association rule is a rule that consists of a set of predicates in which at least a consists of a set of predicates in which at least a spatial spatial predicatepredicate is involved. For instance, is involved. For instance, is ais a((α, houseα, house) ) ∧ ∧close toclose to((α, beachα, beach) ->) -> is expensive is expensive((αα). This approach ). This approach efficiently mines large datasets using a progressive efficiently mines large datasets using a progressive deepening approach.deepening approach.

Page 120: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne
Page 121: Spatial Databases: Lecture 7 Spatial Statistics DT249-4 DT228-4 Semester 2 2010 Pat Browne