geofis · 2019-11-28 · data formats input data different types of layers can be added: vector...

Post on 23-May-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GeoFISSpatial data processing for decision

making

Natalia Iglesias Documentation - Rosario 2019

Contents● Overview

○ Introduction○ Architecture of GeoFIS○ Download and Install

● Starting with GeoFIS○ User Interface○ Vocabulary○ Data Formats○ Data Visualization

● Basic GeoFIS operations○ Filter○ Variogram○ Interpolation

● Advanced GeoFIS operations○ Zoning○ Opportunity Index○ Data Fusion

● Tutorials○ Dataset○ Project creation○ Filter and spatial structure

evaluation○ Vector to Raster○ Data Aggregation○ Zoning

● References○ Publications○ FisPro

Overview

Introduction● GeoFIS is a free and open source software platform for high spatial resolution

data processing with a decision support perspective. ● GeoFIS is developed by a group of researchers from several French research

and education institutions in the field of agriculture and environment (INRA, Irstea, Montpellier SupAgro).

Architecture of GeoFIS [1]●

Download and Install● https://www.geofis.org/en/install/

○ Requirements■ Java installed (version >= 1.8)■ R software (version >= 3.5)

Starting with GeoFIS

User Interface

Left panel shows the object

hierarchy of a project in GeoFIS

Right panel shows maps and graphical objects

Menu

User Interface

A project is a set of maps. It is

saved as a XML format file.

User Interface

Add or remove a map on project. A map can have

one or more information

layers.

User Interface● Add an information layer on a map of the

project.

User Interface

Six types of operations can be

made on an information layer.

Vocabulary● A project is a set of maps. It can be saved in a XML file.● A map is composed of one or more information layers.● A layer contains data. Each layer can be of a different type of file.

Data Formats● Input data

○ Different types of layers can be added:■ Vector data from ESRI shapefile: points, lines or polygons.■ Data from CSV or shapefiles files: ■ Raster data from GeoTIFF (tif, tiff), World file (jpeg, png, gif) or JPEG 2000(jp2, j2k,

jpeg2000) image files. Used as a base map. No operation is available for Raster data

Watch the video tutorial to import data (https://www.geofis.org/en/documentation-en/starting-with-geofis/#video-starting-import-data)

.

Data Formats● Input data

○ CSV fileFirst two columns must be (x, y) coordinates.

Selection of a coordinate system.

Supported delimiters: comma, semicolon, tabulation, and space

First row must contains attribute names.

The CSV file must contain only numeric data (with point “.” as decimal separator), without missing values. Header: one line

Data Formats● Input data

○ Restrictions on attribute names (inherited from R software)

First row must contains attribute names.

● The first character must be a letter.

● Other characters can be letters, digits or “undersore” as separator.

● Accented characters are not allowed.

● The attribute must be unique.

Data Formats● Coordinate systems

○ There are 2 types:■ Geographic Coordinate Systems (unprojected)

● A reference system using latitude and longitude to define the location of points on the surface of the earth (e.g. WGS84 in grades)

■ Projected Coordinate Systems● A map projection is the systematic transformation of locations on the earth

(latitude/longitude) to planar coordinates (e.g. UTM in meters)

● Geographic coordinate Systems (lat/lon) good for locating positions on surface of the earth. But lat/lon not efficient for computing distances and areas.

GeoFIS uses only projected coordinate system

Data Visualization● Style: can be applied to layer to visualize data

○ The “Default” Style ○ The “Geometry” Style (vector layers).

■ The Layer data are displayed according to their geometry (point, polygon, line). You can choose the shape, color, size, label of data. All data in the layer will be displayed the same way.

○ The “Palette” style (vector or raster layers).■ The layer data are displayed in different shades of

color based on the value of an attribute. You can choose the attribute, the palette of colors, the number of classes and the method of classification (equal intervals, jenks, equal count).

● The style can be exported or imported in a SDL file.

Watch the video tutorial style(https://www.geofis.org/en/documentation-en/starting-with-geofis/#video-starting-style)

Data Visualization● Geometry Style:

○ Allows you to change the geometry, size, and color of the points.

All data in the layer are shown in the same way

Data Visualization● Pallete Style:

○ You can choose the palette of colors, the number of classes and the method of classification for each attribute.

Class: number of bins into which

attributes are divided

Palettes: defines the color scheme. 4 color palettes: numerical and sequential

follow a gradient; diverging typically takes range between

three distinct colors and qualitative consist of easily

distinguishable colors

Classifier: algorithms to automatically create breaks. Equal interval: divides input values into bins of equal range. Jenks: identifies groups of

similar values in the data and maximizes the differences between categories. Equal count: ensures the same number of observations fall

into each bin. Unique Interval: takes into account unique values

Basic GeoFIS operations

Filter● This operation analysis the

distribution of the data using a histogram (n=∑i=1_k mi), with the objective of quickly locate and filter outliers. A new “filtered” layer is generated.

The number of bins (k) and the break values can be customized. Different

choices are possible, including equally spaced bins, bins with an equal number of elements, or selecting manually the

break values.

R function: hist

Semi-variogram● Describes the degree of spatial

dependence of the data.○ The variogram model often needs

expert tuning to fit the model taking into account the data set (spatial resolution, shape and size of the area under study)

○ The variogram model can be saved to a .variogram file (XML) for reuse on new data or exporting to other software.

Ү(h) = (∑ [ ( Z(x) – Z(x+h)) ]2 ) / (2N)

The semivariance is calculated for several distance h, where Z (x) is the value of the variable at a site x,

Z (x + h) is another sample value separated from the previous one by a distance h, N is the number of couples

that are separated by that distance.

Semi-variogramSelection of variogram parameters:

● Boundaries (clicked) lets you change the settings,○ Number of points: define the number of

interval wherein the calculation of the semivariance is performed.

○ starting distance defines the position of the first interval,

○ Max distance defines the maximum distance considered.

● Cloud (clicked) lets you calculate the semivariogram cloud.

● Once done, click on Compute

R package: gstatR function: variogram

Semi-variogram● It is possible to choose the variogram

model that best fits the data.

Selection of variogram model and associatedparameters:

● Theoretical model:○ Exponential: similar to spherical but only until a

95% sill value is achieved○ Gaussian: uses a normal probability

distribution curve○ Linear: spatial variability increases linearly with

distance○ Spherical: (almost) linear until the range in

which the phenomenon is stabilized

R package: gstatR function: vgm

Semi-variogram

Selection of variogram model and associatedparameters:

● Nugget: The distance at which the model first flattens out

● Partial Sill: The value that the semivariogram model attains at the range (the value on the y-axis) is called the sill. The partial sill is the sill minus the nugget.

● Range: The value at which the semi-variogram (almost) intercepts the y-value

RMSE: difference between the proposed model and the points observed, it is a support to fit the best possible model

At any time you can reset the model and its parameters

Interpolation● GeoFIS can be used to interpolate data with:

○ a deterministic method (inverse distance (IDW)) ■ create surfaces from measured points, based on neighbor values

○ a geostatistical method (Kriging) ■ autocorrelation -> variance

We observe a property of a phenomenon at a limited number of sample locations and we are interested in the property value at not sampled

locations, so we have to predict it for unobserved locations.

To convert vector data to raster data

Interpolation● Deterministic method (inverse distance (IDW))

IDW is a simple method of estimating a specific value of unsampled locations.

where: z* is the estimated value of a point not sampled, zi is the value at a location i; do,i is the distance from sampled point location to the ith data location si; p is the power selected for the inverse distance estimation

Search distance determines how many points will be used

Each measured point has a local influence that diminishes

proportionally to the inverse of the distance raised to the power

value p. If p = 0, there is no decrease with distance, The

default value is p = 2, there is no theoretical justification

Square cells, organised in a regular lattice

R package: gstatR function: idw

Interpolation● Geostatistical method (Kriging)

Ordinary kriging is based on the assumption that variation is random and spatially dependent, and that the random process is intrinsically stationary with constant mean and a variance

that depends only on separation in distance and direction

Determines the resolution of the estimate map. The ideal choice depends on the application and the data

distribution. Larger grid size more data

Import map contours (.shp, polygon or create from data)

Without border a convex hull is used. In this case the grid depends on data locations

R package: gstatR function: krige

Interpolation● Geostatistical method (Kriging)

Import the variogram created by the variogram process. Nugget, sill range

and max distance values are auto-completed.

Select the number of nearest neighbours

Advanced GeoFIS operations

Zoning● Zoning is the process of dividing land into zones. The type of zone determines

a site-specific management of the land.● GeoFIS uses a segmentation algorithm to ‘zone’ data layers inspired from an

image-processing region-merging algorithm. ● Segmentation is called to the process of defining zones inside an image. The

segmentation methods can be classified into two main families: the contour-based ones and the region-based ones. The first family is more suitable for object recognition, and the second one is useful when there are no well definite borders. This last case may well correspond to agricultural data zoning.

● The segmentation algorithm operates either on irregular or gridded (interpolated) data to generate potential management zones.

Zoning● Algorithm [4]

○ Start: one point = one zone○ Iterate on:

■ Merge the pair of neighbouring zones that are closest in the attribute space■ Update zone list and zone neighbours

○ Until all zones are merged

Zoning

● The input parameters drive the algorithm.

○ The border can be used to limit the processed area,

○ the neighborhood relation can be filtered by a minimal common edge length and

○ various distances are used for zone aggregation.

Border options: convex hull or file. Only data points

within the border polygon are processed

At start, Voronoi tessellation is used to

convert each data point to a zone and to define the initial

neighbourhood

Zoning● Neighborhood: all or line segment

length. In this case, to be considered as neighbors two Voronoi polygons

must share an edge with the specified minimal length.

Zoning● The algorithm can be parameterised by different criteria for the distance metric

Univariate distance: Euclidean or fuzzy [5, 6]. It computes the distance between two data points in the

mono-dimensional attribute space

Multivariate combination: Euclidean (p=2) or Minkowski. The combination is needed when the zoning is done according to several attributes. In this case, each univariate or elementary distance is computed and normalized in a unit interval. These

partial distances are then aggregated to yield the distance between two data points in the multidimensional attribute space.

Zone distance: Minimum, Mean or Maximum. At each step, the algorithm merges the two zones with the minimum zone distance. To compute the distance

between two zones, all the data points included in the two zones are considered and the aggregation is done using the parameter. Maximum is the default value

Zoning● A certain number of zones can be

selected for visualization.

Number of zones

Zoning● Post-Processing allows a final filtration of small zones (according to the area

or the number of points).● When the small zone is included into another, it is just merged with its

surrounding neighbor. When the small zone shares a border with several neighbors it is merged with the closest one, according to the between zone distance.

Watch the video tutorial

Zoning

The zoning algorithm produce a map with several attributes for each zone.

This map can be exported as Shapefile

Geometry of the zone. A polygon delimiting the zone

Attributes zone 3

Unique identifier of the zone

Number of points inside the zone

Area of the zone

For each attribute: mean and std values

Data Fusion● Information fusion is done with a specific goal:

○ for instance parameter estimation according to various sensors or risk level evaluation according to different sources of information. The objective is to compare different alternatives, locations, sites or zones, in the case of spatial data, according to the whole information.

● Only values in the same scale and with the same meaning can be aggregated. When the data are of same kind, like in sensor fusion, there is no problem. This is not true in the general case.

● The most popular aggregation operator is the weighted mean but its modeling power is limited.

Data Fusion● Step

○ Each information layer is transformed into an expert layer: numerical attribute transformed into degree values (from 0 to 1) according to rules. -> Fuzzy function

○ Expert layers are combined using an aggregation operator: WAM, OWA, FIS

Data Fusion● All the attributes must belong to the same

layer. An attribute must be selected as an input

Select function to turn raw data to satisfaction degree. Four types of

membership functions (MF):Semi trapezoidal inf: low values are

preferred; Semi trapezoidal sup: high values are preferred; Trapezoidal: around

an interval; Triangular: about a value

Data Fusion●

Membership functions parameters

Data Fusion●

Data Fusion● Aggregation

An aggregation operator is defined for each aggregated variable. Three are currently

available: WAM: the weights are assigned to the information sources; OWA: the weights are given to the position in the distribution;

FIS: a fuzzy inference system (FIS) including linguistic rules. Linguistic rules are used

within fuzzy inference systems for approximate reasoning.

Data Fusion● FIS aggregation

Operator of aggregation FIS:

Granularity: the number of linguistic terms for each

variable

Rules are used within FIS for approximate reasoning. The

maximum number of rules is given by the product of the input

granularitiesRule conclusions: must be in [0; 1],

may be crisp or fuzzy

Data Fusion

Data Fusion result can be saved to ESRI shapefiles to the

zoning process.

Data Fusion

Data Fusion result can be saved to csv files to

pos-processing with R.

Tutorials

Workflow [1] ● Generic flow of data in precision agriculture with main processing steps from

raw data processing to decision-making

Process: From raw data to dataset

Outside of GeoFIS

Raw data: Coordinates converter● WGS 84 to UTM

○ R scriptsTo find EPSG code:

http://epsg.io/

from/to shapefile

from/to csv file

Convert Geographic Units

Raw data: Border creation● Using QGIS

1

2

3

4

5

6

7

8

The first: Add data layer

Result: Data layer added

Raw data: Border creation

The second: Create layer to border

1

23

4

5

6

Raw data: Border creation●

Draw polygon: On map add points with mouse left click, terminate

with right click. Set id = 1.

Finish editing toggle editing mode and save the layer.

The third: Create border and save to layer

1

2

3

4

5

6

7

Dataset● CropLoad_2014S (precision viticulture example) ● The data were obtained from various sensors and mapped into a common

grid (More information: https://efficientvineyard.com)● File: CropLoad_2014S.csv● Variables

○ EC_Deep (apparent electrical conductivity - Soil variability)○ PW (pruning weight - vine size)○ Crop Load (ratio crop weight to pruning weight)

Process: From dataset to information layers

Add Layers● Data Layer

1

2

3

4

5

6

7

8

Add Layers●

1Data Layer Added

Zoom

Info point 1601

1

2

Add Layers● Border Layer (shapefile)

Border Layer Added

1

2 34

5

Filtering (remove erroneous data)● Histogram

1

2 3

Variables

Spatial structure evaluation● Semi-variogram Variables

1

23

4

5

6

7

8

9

10

111

12

13

Visualization● Style change on border

1

2

3

4

5

6

Visualization● Style change on data

Visual data analysis by classifier selection

Vector to Raster● IDW with border and without border

1

2

3

4

56

7

8

9

Vector to Raster● IDW with border and different p values

p = 1 p = 2 p = 3 p = 4

Vector to Raster● IDW with border and different max distance values

Dm = 10 Dm = 100 Dm = ∞Dm = 5.440 Dm = 1000

Vector to Raster● Kriging with border and without border

1

2

3

45

6

7 8

9

10

11

Vector to Raster● Kriging with border and different N min and N max setting

N min = 1N max = 100

N min = 4N max = 90

N min = 1N max = 10

Process: From information to decision

Question to answer● Where have I potential to increase production (yield)?

○ Multicriteria decision according to:■ Apparent electrical conductivity (EC_Deep)■ Pruning weight (PW)■ Crop Load

EC_Deep PW CropLoad

Data fusion● Input attribute selection and satisfaction degree setting according to expert

knowledge

1

2

3

4

Data fusion● Expert knowledge:

○ 8 mS/m <EC_D < 11 mS/m is good -> degree = 1○ PW > 2.75 lb/vine is good -> degree = 1○ 5 < Crop Load < 10 is good -> degree = 1

1

2

3

Data fusion●

1

2

3

4 5

6

7

8

9

Data fusion●

1

2

3

4 5

6

7

8

9

Data fusion●

2

3 4

5

6

7

8

9

1

Interpretation of data fusion using zoning● Zoning on fusion WAM layer

1

2

3

4

56

7 8

910

11

Prudent decision: Zone distance = Maximum

Interpretation of data fusion using zoning● Zoning on fusion OWA layer

#zone = 1 #zone = 2 #zone = 3 #zone = 4 #zone = 5

Interpretation of data fusion using zoning● Zoning on fusion FIS layer

Interpretation of data fusion using zoning● Zoning on fusion FIS layer with fuzzy distance

1

2

3

4

5

Summary● GeoFIS

○ open source○ flexible○ easy use○ introduce expert knowledge○ decision support○ interoperable with other tools○ open to user needs or contributions

References

Publications● [1] C. Leroux, H. Jones, Léo. Pichon, S. Guillaume, J. Lamour, J. Taylor, O.

Naud, T. Crestey, J. Lablee, and B. Tisseyre, “Geofis: an open source, decision-support tool for precision agriculture data,” Agriculture, vol. 8, iss. 6, 2018.

● [2] S. Guillaume, B. Charnomordic, and B. Tisseyre, “Open source software for modelling using agro-environmental georeferenced data.,” in Ieee international conference on fuzzy systems, Brisbane, Australia, 2012, pp. 1074-1081.

● [3] S. Guillaume, B. Charnomordic, B. Tisseyre, and J. Taylor, “Soft computing-based decision support tools for spatial data,” International journal of computational intelligence systems, vol. 6, pp. 18-33, 2013.

Publications● [4] M. Pedroso, J. Taylor, B. Tisseyre, B. Charnomordic, and S. Guillaume, “A

segmentation algorithm for the delineation of management zones,” Computer and electronics in agriculture, vol. 70, iss. 1, pp. 199-208, 2010

● [5] S. Guillaume, B. Charnomordic, and P. Loisel, “Fuzzy partitions: a way to integrate expert knowledge into distance calculations,” International journal of information sciences, vol. 245, pp. 76-95, 2013.

● [6] S. Guillaume and B. Charnomordic, “Fuzzy partition-based distance practical use and implementation,” in Ieee international conference on fuzzy systems, paper f-1136, Hyderabad, India, 2013.

● [7] Roudier, P., Tisseyre, B., Poilvé, H. et al. Precision Agric (2011) 12: 130

FIS● https://www.fispro.org/es/● FisPro is an open source toolbox to design and optimize fuzzy inference

systems (FIS). Among fuzzy software products, FisPro stands out because of the interpretability of fuzzy systems automatically learnt from data. Interpretability is guaranteed in each step of the FIS design with FisPro: variable partitioning, rule induction, optimization. FisPro includes several modules: fuzzy partitioning, rule and partition learning, inference and FIS optimization.

● Referencia: Serge Guillaume, Brigitte Charnomordic, Learning interpretable fuzzy inference systems with FisPro, Information Sciences, Volume 181, Issue 20, 2011,Pages 4409-4427,ISSN 0020-0255

top related