analysis of satellite images to track deforestation a

57
ANALYSIS OF SATELLITE IMAGES TO TRACK DEFORESTATION A Degree Thesis Submitted to the Faculty of the Escola Tècnica d'Enginyeria de Telecomunicació de Barcelona Universitat Politècnica de Catalunya by Irene Šimić de Torres In partial fulfilment of the requirements for the degree in SCIENCE AND TELECOMUNICATION TECHNOLOGIES ENGINEERING Advisor: Philippe Salembier, Andres Pérez Uribe Barcelona, June 2016

Upload: others

Post on 28-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Master's ThesisDEFORESTATION
Barcelona
SCIENCE AND TELECOMUNICATION TECHNOLOGIES
Barcelona, June 2016
1
Abstract
Deforestation around the world, especially on the tropics, is a current very important
problem that is not being monitored appropriately, leading to belated reactions by
environmental organizations and governments and more deforestation fronts every year.
A project called Terra-I was created a few years ago with the aim of changing this by
carrying out a temporal analysis of satellite images, from NASA MODIS satellite, to
monitor near-real time information and accelerate the reactions to new deforestation
fronts.
This was performed at first only for the Amazonian forest and the whole South America
but it is willing to be expanded to the whole tropics.
Due to the launching of landsat8 satellite, with improved information about satellite
images, this project proposes the idea of adding to that analysis a spatial one to improve
the results of the deforestation detection carried out by Terra-I with the aid of machine
learning algorithms.
2
Resum
La desforestació mundial, especialment en els tròpics, es un problema actual molt
important que no està sent controlat de manera apropiada, portant a que organitzacions
ambientals i governs reaccionin tard a controlar la desforestació i, per tant, a augmentar-
la any rere any.
Fa uns anys, el projecte Terra-I fou creat amb l’objectiu de canviar-ho duent a terme un
anàlisi temporal d’imatges del satèl·lit NASA MODIS per processar informació en temps
quasi real i accelerar així la reacció envers a nous fronts de desforestació.
En un principi es va dur a terme en zones de selva amazònica i per a tota Amèrica del
Sud, però s’està treballant en l’ampliació sobre tots els tròpics.
Gràcies al llançament del satèl·lit landsat8, amb informació millorada sobre les imatges
obtingudes, aquest projecte proposa la idea d’afegir un anàlisi espaial per millorar els
resultats de la detecció de desforestació que duu a terme Terra-I amb l’ajuda
d’algoritmes de machine learning.
3
Resumen
La desforestación mundial, especialmente en los trópicos, es un problema actual muy
importante que no está siendo controlado de manera apropiada, dando lugar a que
organizaciones ambientales y gobiernos reaccionen tarde a controlar la desforestación y,
por tanto, a aumentarla año tras año.
Hace unos años, el proyecto Terra-I fue creado con el objetivo de cambiarlo llevando a
cabo un análisis temporal de imágenes del satélite NASA MODIS para procesar
información en tiempo casi real y acelerar así la reacción respecto a nuevos frentes de
desforestación.
En un principio se llevó a cabo en zonas de selva amazónica y para toda América del
Sur, pero se está trabajando en la ampliación sobre todos los trópicos.
Gracias al lanzamiento del satélite landsat8, con información mejorada sobre las
imágenes obtenidas, este proyecto propone la idea de añadir un análisis espacial para
mejorar los resultados de la detección de deforestación que lleva a cabo Terra-I con la
ayuda de algoritmos de machine learning.
4
For those who were by my side during these last 5 years
5
Acknowledgements
I would like to thank my advisors, Philippe Salembier and Andres Perez, who did not
doubt helping me to develop this project whenever I needed some advice and made it
what it became, and to Julien Rebetez, for guiding me along these last five months,
always with a smile and helping me getting out of any trouble I got into during the project.
Also, I would like to thank all the new friends I made in Switzerland, that were going
through the same thing and with whom I spent uncountable afternoons working on the
office. You made the work easier and I am really thankful for that.
Finally, I would like to thank my family and my couple, who were always there to cheer
me up when everything was upside down.
Thank you all for these amazing five months that you have been there for me.
6
Revision Date Purpose
Date 10/07/2016 Date 15/07/2016
Table of contents .............................................................................................................. 7
List of Figures ................................................................................................................... 9
List of Tables .................................................................................................................. 10
1.3. Workplan .......................................................................................................... 14
1.3.2. Gantt Diagram ........................................................................................... 16
2.1. Machine Learning ............................................................................................. 19
3.1.1. Libraries .................................................................................................... 25
3.2. Data ................................................................................................................. 25
3.3.1. Optimal model ........................................................................................... 29
3.3.1.2. Classification quality estimation ............................................................... 30
3.3.1.3. Definition of the model ............................................................................. 31
8
4. Results .................................................................................................................... 37
4.1. Training with the final model and Logistic Regression model ............................ 37
4.1.1. Training with upper half ............................................................................. 39
4.1.1.1. The model performed during this project .... ¡Error! Marcador no definido.
4.1.1.2. Logistic Regression ................................................................................. 40
4.1.2.1. The model performed during this project ................................................. 42
4.1.2.2. Logistic Regression ................................................................................. 42
4.2.2. Detections ................................................................................................. 46
5. Budget ..................................................................................................................... 51
Bibliography .................................................................................................................... 54
Figure 2.1: Example of dataset .................................................................................. p.20
Figure 2.2: Example of neural network for three classes ............................................. p.21
Figure 2.3: Example of convolutional neural network for CIFAR-10 ............................ p.22
Figure 3.1: Jupyter notebook sheet example .............................................................. p.24
Figure 3.2: Image sizes tests, where 0 is forest, 1 is field and 2 is clouds .................. p.27
Figure 3.3: example of labelling in a South American region ....................................... p.28
Figure 3.4: example of a good learning process’ loss and accuracy graphic ............... p.29
Figure 3.5: Representation of the model chosen ......................................................... p.32
Figure 3.6: First convolutional layer filter’s extracted features for both bands ............. p.33
Figure 3.7: Band 4 values for forest, field and clouds.................................................. p.33
Figure 3.8: Band 5 values for forest, field and clouds.................................................. p.34
Figure 3.9: Representation of the classification process with the trained model .......... p.35
Figure 3.10: Output values after the classification process on figure 3.9 ..................... p.36
Figure 4.1: South American satellite zone and image used for this experiment ........... p.37
Figure 4.2: labelling for the region 227_65_290 .......................................................... p.38
Figure 4.3: labelling for the region 227_65_172 .......................................................... p.38
Figure 4.4: labelling for the region 1_86 ...................................................................... p.38
Figure 4.5: Region classified on the experiment “training with the upper half” ............. p.39
Figure 4.6: Result of the classification with our model for the 1st training .................... p.40
Figure 4.7: Result of the classification with logistic regression for the 1st training ........ p.41
Figure 4.8: Region classified on the experiment “training with the lower half” ............. p.41
Figure 4.9: Result of the classification with our model for the 2nd training .................... p.42
Figure 4.10: Result of the classification with logistic regression for the 2nd training ..... p.43
Figure 4.11: Classification for 2013 image from wet season ....................................... p.44
Figure 4.12: Classification for 2015 image from wet season ....................................... p.45
Figure 4.13: Classification for 2013 image from dry season ........................................ p.45
Figure 4.14: Classification for 2015 image from dry season ........................................ p.45
Figure 4.15: Deforestation detection wet season ........................................................ p.46
Figure 4.16: Deforestation detection dry season ......................................................... p.46
Figure 4.17: Final detection on 2013 image for wet season ........................................ p.47
Figure 4.18: Final detection on 2015 image for wet season ........................................ p.47
Figure 4.19: Final detection on 2013 image for dry season ......................................... p.48
10
Figure 4.20: Final detection on 2015 image for dry season ......................................... p.48
Figure 4.21: Our detection .......................................................................................... p.49
Figure 4.22: Tree cover loss detection ........................................................................ p.49
11
Table 3.1: band designations for landsat8 ............................................................... p.26
Table 4.1: classification report on our model for the 1st training ............................... p.38
Table 4.2: classification report on logistic regression for the 1st training ................... p.39
Table 4.3: classification report on our model for the 2nd training ............................... p.41
Table 4.4: classification report on logistic regression for the 2nd training .................. p.42
12
1. Introduction
It is well known that forests are vital to the planet, providing critical ecosystem services
and livelihood to people and shelter to wildlife but, despite that fact, human activities
leading to deforestation are rapidly threatening them.
Half of the world’s tropical forests (which cover about 47% of the world’s forest) have
been destroyed over the last century and, according to the WWF Living Forests1 model,
up to 170 million hectares of additional deforestation will occur by 2030 if business as
usual continues.
The figure 1.1 below these lines shows a map from WWF with the current deforestation
fronts.
Figure 1.1: WWF map with current deforestation fronts
Although, as stated previously, humans have had clearly profound impacts on the Earth’s
natural ecosystem, in many parts of the world the scale and pattern of habitat loss goes
unmonitored or roughly monitored.
This problem leads to taking conservation and sustainable development decisions to
manage those impacts on ecosystem services without a complete understanding of the
current state and recent history of land cover and use change.
In response to this, some researchers of the CIAT (Center for Tropical Agriculture)
created a project called Terra-I2 [1] and composed a deforestation map with the
monitoring of near real time information based on machine learning algorithms.
Those algorithms allowed Terra-I to monitor the information only in a temporal way, by
comparing images from the same place but at different points in time pixel by pixel, which
ended leading to quite good results.
1http://wwf.panda.org/wwf_news/?245370/Over-80-of-future-deforestation-confined-to- just-11-places 2http://www.terra-i.org
13
The reason of monitoring only in a temporal way is because, despite the fact that the
spatial analysis was a very good option, by that time the best sensor for this kind of
detection (and the one used by Terra-I), NASA MODIS, could measure the greenness of
the earth surface every 16 days, but only with a 250m resolution, which is quite poor to
achieve the detection in a spatial way.
However, the remote sensing of the Earth from satellites has made a great progress
since the moment when Terra-I started, and on 2013 a new satellite with a 30m spatial
resolution called landsat8 was launched.
This launching led to think again about the possibility of developing the spatial analysis of
the images, since 30m spatial resolution could be enough for the algorithms to distinguish
the type of soil of the pictures. This would lead to a great improvement in the detection by
the addition of this analysis to the temporal one, and that is how this bachelor’s thesis
main goal was set.
The following sections will lead to an introduction to the objectives of this work.
1.1. Statement of purpose
The main goal of this bachelor’s thesis is to create a new algorithm based on spatial
information to distinguish and classify as forests and fields landsat8 satellite images, in
order to be able to track deforestation.
At first, this implies making the algorithm able to distinguish between the different levels
of greenness and forms of the soil and relating those colours and architectures to fields or
forests.
This leads to the creation of a model able to carry out the classification with a good level
of precision-complexity relation.
After this step, the model must be trained with a certain number of tropical forest images
that will allow the classification around some tropical zones with quite good precision. As
Terra-I did until now, we’re going to give priority to the forest around South America,
since it has been the first world’s most important deforestation front for a long time now.
The final objective will be to show how this implementation improves the deforestation
detection and works as well as the previous algorithms and even improves some of the
failures found before, by analyzing some results and experiments carried out at the end
of the project.
- Python language to develop the code.
- Keras library to create the model.
- Landsat 8 satellite images to train and test the model.
14
Project specifications
- Create a classification model to be able to distinguish between the different levels
of greenness of the forests around the globe and depending on the season.
- Train the model with specific tropical forest images to reach the detection along
the whole tropics.
- Detect the levels of greenness on the same zone in different years to be able to
track the deforestation occurred in that period of time.
1.3. Workplan
Major constituent: Planning
specifications. First contact with the project.
Planned start date: 22/02/2016
Planned end date: 01/03/2016
Internal task T1: Definition of the project and the timeplan
Internal task T2: Elaboration of the document Project Proposal
and Workplan.
Short description:
Manual classification of the images that are going to be used
to train the classification program
Planned start date: 01/03/2016
Planned end date: 09/03/2016
Internal task T2: Learn to download the images from landsat8
Internal task T3: Use GRASS7 to color the images
Deliverables:
they connected to Python and Keras
Short description:
Get familiarized with concepts such as neural networks, Keras
and libraries that may be used by it as Tensorflow and Theano. Start event: 10/03/2016
End event: 08/04/2016
Internal task T2: Get familiarized with the neural networks term
Internal task T3: Study the use of Keras in neural networks
Internal task T4: Get familiarized with neural networks examples
such as CIFAR10
WP#4: Programming of the Neural Network WP ref: (WP4)
Major constituent: Creation of the model that will be used to
classify the images
Short description:
Creation of the optimal Neural Network model by training it and
testing its accuracy with the results of its classification. Keep
working on it as we increase the number of classes.
Planned start date: 08/04/2016
Planned end date: 17/06/2016
Internal task T3: Classification prediction for new images.
Internal task T4: Elaboration of the Critical Review
Deliverables:
Critical
Review
Dates:
09/05/2016
Deforestation detection
Short description:
Classification of two images from the same place delayed on
time and detect the level of deforestation occurred during that
period.
Start event: 26/05/2016
End event: 12/06/2016
Internal task T1: Classify the two images with the model created
before.
levels of both images.
Major constituent: Final experiments
Making some experiments to end up with the project and
preparing and writing the final report
Planned start date: 17/06/2016
Planned end date: 08/07/2016
Deliverables:
1.4. Incidences
Two of the goals that we wanted to achieve have not been carried out because of the
lack of time: the amplification of the deforestation range to the whole tropics (although it
has been introduced) and the implementation of the algorithm in Terra-I, which has not
been performed yet.
The first one it turned out to take too long on computational time, since it requires a large
dataset of images that it was not possible to manage in such a short time.
On the other hand, although it is planned to implement the algorithm in Terra-I it is not
going to be done at the end of this period.
1.5. Content
To start the document, a first background and review of the relevant and recent research
on the methodology used is made. In this section, Machine learning and Classification
algorithms will be introduced, as well as the data that Terra-I previously used.
On the next section, the Machine learning methodologies chosen will be more profoundly
exposed, as well as the justifications for the procedures taken during the project. The
main algorithm chosen for the classification will be shown and justified.
The next chapter of the document will show how the implementation of this algorithm is
functional and, relating it to the results, how it can provide an improvement for the
deforestation detection in the main project.
Finally, at the end of the document a section for conclusions will cover and relate all the
sections and experiments of this project, by justifying the usefulness of this research in
the future of Terra-I.
2.1. Machine Learning
Machine learning solves the requirement of how to build computers that improve
automatically through experience. It is one of today’s most rapidly growing technical fields,
lying between computer science, statistics, artificial intelligence and data science [2].
To clearly address this field to the needs of this project, the chapter will introduce this tool
applied to image recognition.
2.1.1. Introduction
As stated before, machine learning aims to “make computers learn” about something, as
well as humans do. Some different learning paths have been proved to achieve that from
the start of machine learning until nowadays, but they can be all divided into two main
categories: supervised and unsupervised learning [2][3].
In the first case, supervised learning, a dataset with examples of input / output has been
provided before the learning process, which means that the process improves with the
aim of achieving the desired output already known. Therefore it simulates a learning
process with a teacher, who guides it until it gets to the desired output.
In the second one, there is no desired output, so it allows approaching problems with little
or no idea what the results should look like. There is no feedback, so nobody supervises
the process.
Supervised learning problems are categorized into regression and classification problems,
in which the difference is the continuous and discrete output respectively.
In the case of this project, we would want to determine if a specific zone of the planet has
suffered from deforestation or not. To do so, knowing that the output can be forest/field
type is enough for the computation.
Therefore, since we will have a dataset of examples and a discrete known output before,
we will use a classification approach.
2.1.2 Classification
In classification the purpose is to determine what the input data is by separating it into
some discrete previously defined values.
In our case this would mean define whether a specific zone of an image corresponds to
field or forest output type or, as they are called in this type of machine learning, class. To
do so, the algorithm that will be able to achieve the classification must have a dataset of
examples that relate some kind of image section to some kind of output type.
20
For example, we could have in the dataset the data shown on figure 2.1 (where 0
corresponds to “forest” and 1 to “field”) previous to the learning process.
This kind of data may help the algorithm to learn how forest and field areas look like and
the differences they have, in order be able to create a decision boundary and separate
the data corresponding to each class.
Although it has been introduced the classification for only two classes, is also possible to
use it for multiple outputs.
In fact, as we kept working on the project we realized that it would be worth it to add a
new class to the algorithm: clouds. This is due to the fact that, since we work with satellite
images, it is almost impossible to find all the set of images without any clouds. Hence to
improve the classification we ended up creating an algorithm able to classify data into
those three outputs.
2.1.3 Neural networks
As we introduced before, machine learning appeared with the aim of making machines
able to learn, which meant creating some kind of “brain” with a set of algorithms that
could imitate an animal brain itself. That is why, in one of the most important supervised
learning techniques, these sets of algorithms are called neural networks.
Since neural networks are inspired from the animals’ brain, the model of how they work is
a very simplified version of how we know the brain works.
At a very simple level, we could say that neurons are computational units that take inputs
that are channelled into outputs, by weighting them and applying a so called “activation
function”.
A set of neurons create a layer that is connected to other layers and can be either input,
hidden or output.
21
Input layer: is the one giving data to the neural network.
Output layer: is the one that contains the set of final values computed along the
learning process.
Hidden layer(s): all the layers in between guiding the learning process to the
desired output.
All neural networks must have one input layer, one output layer and as many hidden
layers as needed.
We can see an example of a neural network for three classes in the figure 2.2.
Figure 2.2: Example of neural network for three classes
In the figure we can distinguish the different layers and their set of nodes. Each node is
activated following a mapping function from layer k to layer k+1 controlled by the set of
parameters w called weights.
During the learning process, these weights are updated and improved with the help of the dataset. The training goal is to produce the best weights: the one with the highest accuracy when performing the classification.
We can also differenciate the bias nodes in pink from the others (called neurons) in the
same layer. These nodes’ weights are always 1.
The reason for this is that their goal is to shift the linear combination represented by other
weights. Thinking about 1D example, the bias b allows to go from y = ax to y = ax + b,
where in the first case there is a limitation by lines that pass through the space origin and
in the second one all possible lines in the plane can be described.
2.1.1.1. Convolutional neural networks
Convolutional neural network is a type of feed-forward artificial neural network in which
the connectivity pattern between its neurons is inspired by the organization of the
animal visual cortex. This is why this type of neural networks is going to be the one
employed to create the classification algorithm.
It is very common when starting to use convolutional neural networks to create simple
networks for CIFAR-10 classification [5]. This is an established computer-vision dataset
10 object classes, with 6000 images per class.
On the other hand, there are three main types of layers used to build Convolutional
Neural Networks architectures: Convolutional Layer, Pooling Layer and Fully-connected
Layer [4].
In figure 2.3 a simple example of a convolutional neural network for CIFAR-10 is shown.
The network starts with an input layer containing 3 different data (RGB) of a typical 32x32
CIFAR 10 image that, through the process of the network, ends up corresponding to one
of the 10 different classes of the model.
In this network we see before the output a fully-connected layer. This layer, after the
whole process, gives for each class the probability that the input image corresponds to it.
That is how at the end, by using a specific function, we can decide the class that the input
picture belongs to.
Figure 2.3: Example of convolutional neural network for CIFAR-10
The specifications of each layer used during this project will be explained on the next
chapter.
2.2. Terra-I
With the aim of introducing where this project takes part on Terra-I, a brief introduction of
the data used currently by that project is going to be performed.
Terra-I worked until now with data obtained by remote sensing by 250m resolution
MODIS satellite, and has been using some methods in order to temporary analyze the
images, since the resolution was too poor to develop the spatial analysis.
The data used for the analysis is determined by 3 different measurements [6][7] given by
MODIS:
Quality
23
The point after collecting these measurements was to use them with the aim of predicting
the future NDVI value for a given point based on the current and previous NDVI and rain
(TRMM).
NDVI
NDVI3 (Normalized Difference Vegetation Index) measures the vegetation index of some
region by using Near Infrared (NIR) and Visible red (VIR) measurements in the same
zone.
Actually, the computation is as simple as:
This measurement takes into account that live green plants strongly absorb visible light
(from 0.4 to 0.7 µm) and strongly reflects near-infrared light (from 0.7 to 1.1 µm).
Although this index is very reliable, it has some problems relevant for the case of study.
For example, it doesn’t give good results if the sensed area is covered with clouds.
Quality
This is a value also given by MODIS that indicates the level of precision of the sensor
measurement on the sensed area. This value evaluates the quality of the products with
respect to their intended performance pixel per pixel.
Trying to erase the bad results from NDVI, this index can be vital to determine if the area
may or not be cloudy.
However, this measurement is not always reliable, so clouds are still a problem in the
process; actually, the main problem.
In fact, this was the reason why, as briefly introduced on section 2.1.2, while performing
this improvement of the project we decided to detect also clouds on the images, in order
for us to be able to erase all the information in a more reliable way.
This will be more specifically explained later on the document.
TRMM
TRMM4 (Tropical Rainfall Measuring Mission) is the last feature used in the process. It
measures the level of precipitations on Earth.
The next chapter will show the different data taken now from the landsat8 satellite and its
relation to these measurements.
3. Methodology
This chapter exposes the methods used during the project as well as its development and
improvement.
3.1. Programming language and work environment
To carry out the project, the language chosen to program the algorithm was Python, and
the resource used to program in Python: IPython notebook5.
This notebook, also known as Jupyter notebook, is an interactive computational
environment in which you can combine code execution, text, mathematics and plots.
In figure 3.1 an example of a Jupyter notebook sheet is shown.
5https://ipython.org/notebook.html
25
3.1.1. Libraries
Between the large amounts of machine learning libraries existing nowadays, there are
two main ones specifically for python language: Scikit-learn6 and Keras7.
Scikit-learn library is the deepest one in machine learning and the most popular
among all languages. It is build on top of NumPy and SciPy.
Keras, on the other hand, is a minimalist, highly modular neural networks library,
capable of running on top of either Theano8 or Tensorflow9 libraries.
For the development of this project the library used to create and train the algorithm was
Keras, and it was used on top of both Theano and Tensorflow. This is an advantage
because both of them are very useful for different uses in neural networks.
The way Keras works is very easy, since its core data structure is a model that organizes
layers. The main type of model, and also the one used in this project, is the Sequential
model.
As a neural networks library, it contains all types of layers, among them the ones used for
Convolutional neural networks: Convolutional layers, MaxPooling layers, Activation layers
and Dense, which is the regular fully-connected layer that we saw before in this
document.
3.2. Data
The data used for computing the classification are Landsat Surface Reflectance High
Level satellite images from landsat8, extracted from the USGS earth explorer10. These
images have a 30m pixel resolution and have been improved by computing the correction
of the surface reflectance, which make its analysis easier and more accurate.
The pictures extracted from this satellite are 11 band images [8][9], each one of which
has special features seen on the table 3.1.
Bands Name Description
Band 1 Coastal aerosol Ultra-blue band. Useful for coastal and
aerosol studies.
Band 4 Red
Band 5 Near Infrared (NIR) Especially important for ecology because
plants reflect it.
useful for differing wet earth from dry
earth and for geology (strong contrast
between soil and rocks). Band 7 SWIR 2
Band 8 Panchromatic Combines all collecting visible colors into
one channel.
Band 9 Cirrus Useful for cirrus cloud detection.
Band 10 Thermal Infrared (TIRS) 1 Useful in providing more accurate surface
temperatures. They are collected at 100
meters. Band 11 Thermal Infrared (TIRS) 2
Table 3.1: Band designations for landsat8
The previous chapter showed that Terra-I used the value of NDVI to distinguish between
forested and deforested zones and it was computed by using VIR and NIR values. On
the table above, those values are given by bands 4 and 5, so it would make sense, then,
for the aim of this project, to use these two bands from the landsat8 satellite.
On the other hand, it was also described that a very important feature used in Terra-I was
the Quality value, which was determined basically by the cloudiness of the image. To
solve the problem of the wrong quality values in this project, as previously stated, a new
class called clouds was added instead of using the quality value.
Thus, no matter if the image is cloudy or not, the classification model would be able to
make a good detection of the deforestation by using the information about clouds.
3.2.1. Images size
To perform the detections, we wanted to classify each pixel of the landsat images by
taking into account a specific region around it, so that the analysis could be in a spatial
way.
For this reason, a few tests with small sizes were performed until choosing the best
image size to execute the classification.
The tests carried out were using masks of 64x64 pixels, 32x32 pixels and 16x16 pixels
around the pixel that we wanted to classify. The figure 3.2 shows the different attempts of
classification with these three different sizes.
In the case of smaller sizes it would not make sense to take them into account because
the model would not be able to differentiate between field and forest, since their colours
and architectures at those small sizes would be very similar, so it is not an option.
After all the attempts, 16x16 pixels size seemed the best choice, since the model could
recognize what was on the image with quite enough precision and the process was very
much quicker than the other two.
27
Figure 3.2: Image sizes tests, where 0 is forest, 1 is field and 2 is clouds
3.2.2. Images labelling
Working with classification asks for having some datasets and examples before the
learning process is carried out, as well as the desired output for those examples.
Therefore, before starting to program the algorithm, it is necessary to create it. In this
case this dataset must be composed by some images where their regions have to be
related to the values 0 (forest), 1 (field) and 2 (clouds).
To do so, there exists a program that allows performing this on a very easy way, QGIS10.
This program allows the manually labelling of the images that must be used later to train
and test the neural network model.
Also, a program called GRASS711 can be used to turn the images, which are 11 band
images, into RGB by gathering the RGB bands together and, therefore, identify better
their features.
In figure 3.3 we can see an example of a labelling from a South American region.
10http://www.qgistutorials.com/en/
11https://grass.osgeo.org/grass7/
28
The results of the labelling remain saved on a polygon file that recognizes each selected
region as forest, field or cloud. This file is the one that must be loaded, as well as the
image in which the manual classification was computed, so that the neural network model
can be trained.
Below these lines there is an example of the loading of this file and both bands from the
image.
After the loading, a relation between the data must be performed. That is relating in the
program the image as the input dataset and the polygon file as the output desired.
Thankfully, this step was already fulfilled on Terra-I before, so for this project the function
made then could be re-used. This function is called rasterize_label_shp (See Apendix A)
and relates the shape layer (extracted from the polygon file) to the region of the globe
where it belongs to.
Thereafter, the algorithm must be able to relate each output value (0, 1 or 2) from the
shape file to the 4th and 5th bands’ values in the same point of the picture.
shape_dataset = ogr.Open(os.path.join('fields_polygon_227_290.shp')) shape_layer = shape_dataset.GetLayer(0) band_4 = gdal.Open(os.path.join('LC82270652015290LGN00_sr_band4.tif')) band_5 = gdal.Open(os.path.join('LC82270652015290LGN00_sr_band5.tif'))
Figure 3.3: Example of labelling in a South American region
29
3.3. Algorithm creation and development
The algorithm to train and validate the convolutional network able to do the desired
classification must be defined, as well as the network itself. To do so, the process is
divided in two main parts: the search for the optimal model (by creating, training and
testing the neural network model) and the visual validation on images.
3.3.1. Optimal model
In this section of the project it is taken into account the creation of the model together with
the training and testing because it is after the results of these processes that the model
must be changed and improved little by little.
To start at first with the creation of the model, the use of convolutional neural network
examples made for CIFAR10 (see section 2.1.1.1) dataset is very usual. In this case so,
we followed the same path, only adapting it to our dataset. That is, for example, changing
the input layer, since the images chosen for this project have 16x16pixel size unlike the
CIFAR-10 ones, as well as the output, since this project only requires 3 classes: forest,
field and clouds.
From here, while training the model and testing it, the optimal option was chosen.
3.3.1.1. Training of the model
The first thing to do before carrying out the classification, as we mentioned before, is to
have a dataset of examples that can allow the algorithm to learn.
Once this is performed, the next step is to select from the dataset the sample points that
are going to be used for the training of the model and the ones that will serve to test it.
The idea of doing this is to prove that the model is learning how to classify each region as
field, forest and clouds and is not learning by heart the whole dataset. This learning
process can be graded taking into account the loss and accuracy each iteration or “epoch”
of the training.
In figure 3.4 an example of a good learning process is shown, since both training and
validation loss decrease on each iteration and the accuracy, contrarily, increases.
Figure 3.4: Example of a good learning process’ loss and accuracy graphic
30
As long as the learning process is not good enough, an improvement of the model must
be carried out until it ensures that the model is not learning by heart.
3.3.1.2. Classification quality estimation
Once the training is done, to truly find the best classification model, a quality estimation of
all the possible models must be performed [7], so that the optimal model can be chosen
giving the best results.
To compute this quality estimation, an image from the dataset (previously and manually
classified) that has not been used in the training process of the model, has to be
processed.
With the manual classification of the image and the one made by the model on it, each
pixel of the image must be defined, according to its kind and for each one of the three
classes, as one of the following 4 categories:
True positive (tp): corresponds to a pixel that is classified by the trained model as
the class that we are examining and it matches with the previous manual
classification of itself.
False positive (fp): corresponds to a pixel that is classified as the class we are
examining but it actually belongs to any of the other classes.
True negative (tn): corresponds to a pixel that is not classified as the class we are
examining in both cases.
False negative (fn): corresponds to a pixel that is not classified as the class we
are examining even though it belongs to that class according to the dataset.
Once determined in which category belongs each pixel for each class, the next step is to
compute some measurements.
Precision: It measures the probability that a pixel classified as forest, field or cloud
is actually that.
Recall: Measures the probability that a pixel is classified as forest, field or cloud.
F1-score: is the harmonic mean between precision and recall.
The higher the results of the measurements are the better is the model. These are,
therefore, the indexes that we used to determine which model was the best choice for the
classification that we wanted to achieve.
31
3.3.1.3. Definition of the model
After carrying out all the previous steps, training and qualifying some model patterns, the
optimal choice created for our purpose is described on the following lines.
As explained on section 3.1.1, Keras is structured as a model in which the layers must be
added in a very easy way. Specifically in this model, those layers are:
Convolution 2D: In this case, it has 16 different random filters 3x3 that make
convolution on the 2 dimensions input units in order to extract main features of
them.
Maxpool 2x2: For each 2x2 region of the previous layer’s images it chooses the
pixel with more information for the classification, reducing by 2 the size of the
images.
Dropout 0.25: It sets a fraction 0.25 of the input units to 0, which prevents
overfitting.
Flatten: Flattens the input without affecting the batch size.
Dense n: It fully connects the input resulting on an n batch size.
Activation ReLu: Is a function that sets to 0 all the negative values and leaves the
same the positive ones. It doesn’t affect the size or the batch size.
Activation Softmax: Is a function that sets all the input values in the range [0, 1]. It
doesn’t affect the size or the batch size.
The representation of this model on a more visual way is on figure 3.5.
from keras.models import Sequential from keras.layers.core import Dense, Dropout, Activation, Flatten from keras.layers.convolutional import Convolution2D, MaxPooling2D from keras.optimizers import Adam model = Sequential() nb_classes = 3 input_shape = X_train.shape[1:] model.add(Convolution2D(16,3,3, input_shape=input_shape, activation = 'relu', name = 'conv1')) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.25)) model.add(Convolution2D(16,3,3, activation = 'relu', name = 'conv2')) model.add(Convolution2D(16,3,3, activation = 'relu', name = 'conv3')) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(8)) model.add(Activation('relu')) model.add(Dense(nb_classes)) model.add(Activation('softmax')) sgd = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08) model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
32
Following the image, the process can be more easily understood.
The 16x16 pixels input goes through some layers that extract the best features of it and,
thereafter, the model gives a probability of the image of being forest, field or clouds type.
Afterwards, once gone through the softmax activation, the model decides which one of
them is more likely to be.
As seen on section 3.3.1.1, once the model is created and chosen it must be trained until
the relation between the layers is the optimal to extract the most important information of
the input image and classify it, thus, on the most accurate way possible.
After that, the final and optimal weights used on each layer to achieve the best
classification must be saved executing the following line:
Note: as we can see, the model weights are saved on an hdf5 format. This is a
hierarchical data format designed to store and organize large amounts of data.
3.3.1.3.1. Features extraction
Once the model is trained and saved, it is possible to look at the input that maximizes
each filter, the features extracted from it, to understand how the model is able to classify
well each picture [10].
This can be useful also for improving the model by erasing some layers that can be
useless or changing them in the way that more features can be extracted.
On figure 3.6 a compilation of the first convolutional layer filters’ extracted features is
shown.
model.save_weights('model_weights.hdf5')
33
Observing the image above, we can notice the fact that the convolutional filters try to
extract colours from the input units. We can conclude that the best feature that can be
extracted from them is the colour.
In fact, if we take a look to the values of both bands for forest, field and clouds, we will
see that they are so different that this feature can be enough to distinguish between the
three classes. In figure 3.7 and 3.8 a proof of this can be seen for some random input
data of a South American image.
Figure 3.6: First convolutional layer filter’s extracted features for both bands
Figure 3.7: Band 4 values for forest, field and clouds
34
This is why we can say that our model is using mainly the colour feature of the images to
carry out the classification.
3.3.1.3.2. Logistic Regression
Actually, when we watched at the features extracted, we realized that it was easier for the
model to classify the images following a colour pattern rather than a structure one, which
is what we were seeking for.
When we noticed that, we found out that we did not need a very hard and complex model
to achieve it, so everything turned out much more simple.
In scikit-learn, there exists a simple model called Logistic regression that is already
created and ready to use for classification and, since at the end of the project we ended
up with a very easy one, we decided to make some experiments with both, our model and
Logistic regression model, to compare the results and show that a convolutional neural
network introduces a noticeable improvement.
This will be further exposed on the next chapter of this document.
3.4. Visual validation on images
Once the model is chosen and saved, the next step is to compute the code necessary to
carry out the classification process using the model.
This classification, to be truly useful, must be performed for a brand new image from
which the model has not seen or used previously the dataset, which means that the
model knows nothing about it.
First of all, since the satellite images tend to be very large (usually 7631x7781 pixels) to
fully classify them at once could take a lot of computation time, so cutting them into, for
example, 200x200 pixels pieces can considerably ease the process.
Then, for the small 200x200 pixels image, each pixel has to be classified by taking into
account its surroundings, which means taking the 16x16 (see section 3.2.1) pixels region
around every single one and make it go through the model to extract its value.
This process must be taken along the whole image until it is all fully classified.
The figure 3.9 tries to explain better this process by representing it in a satellite image
without taking into account the real size of the image.
Figure 3.8: Band 5 values for forest, field and clouds
35
The part of the code able to carry out this classification is defined below these lines:
import sys for l in range(win, x_final - win, it): for m in range(win, y_final - win, it):
print l, m sys.stdout.flush() X_predict = [] if x_final-win-l >= it and y_final-win-m >= it: x_fin = l + it y_fin = m + it elif x_final-win-l < it or y_final-win-m < it: x_fin = x_f - win y_fin = y_f - win if x_final-win-l > it: x_fin = l + it elif y_final-win-m > it: y_fin = m + it for i in range(l,x_fin): for j in range(m,y_fin): subimg = landsat_img[(i-win):(i+win),(j-win):(j+win)] X_predict.append(subimg.T) X_predict = np.array(X_predict).astype(np.float32) X_predict /= 255.0 X_predict = (X_predict - X_mean) / X_std predict_label = model.predict_classes(X_predict, verbose=0) predict_label = predict_label.reshape(x_fin-l,y_fin-m) output[l:x_fin,m:y_fin] = predict_label
Figure 3.9: Representation of the classification process with the trained model
36
On the previous code, x_final and y_final correspond to the height and width of the 2
dimensional image, landsat_img is the combination of both B4 and B5 bands of the
satellite image to classify and output is the final result of the classification for each pixel
of the input image (at the end must have the same size as the input image).
Figure 3.10 represents how, after this process for the image shown on figure 3.9, the
output should end up looking like, where blue corresponds to forest, green to field and
red to clouds.
Notice that the model has some troubles detecting clouds. On the following section this is
going to be discussed.
Figure 3.10: Output values after the classification process on figure 3.9
37
4. Results
This chapter exposes the obtained results to the experiments carried out in order to prove
the validation of the project.
As introduced in section 1.1, one of the main goals of this project was to achieve
deforestation detection in a spatial way, although in section 3.3.1.3.2 it was introduced
the problem of the simplicity of the model.
For these reasons, the experiments carried out were done with the purpose of evidencing
the improvement that the development of this project may have meant for Terra-I.
4.1. Training with the final model and Logistic Regression model
The first test was to prove the improvement that our model achieved compared to the
already trained logistic regression model.
With this purpose, we decided to perform two experiments on the satellite image shown
in figure 4.1 from some region in Brazil, in South America, which we had already
manually classified.
Figure 4.1: South American satellite zone and image used for this experiment
38
In order to perform the experiments, we used three different images from South America
as a previous dataset to train and test the models, in between them the image shown
above.
These images where manually classified before, performing the following labellings:
Figure 4.2: labelling for the region 227_65_290 Figure 4.3: labelling for the region 227_65_172
Figure 4.4: labelling for the region 1_86
39
4.1.1. Training with upper half
The first experiment consisted on training both models separately with the upper half of
the three images and then using them to classify some region on the lower half of the
image shown on figure 4.1. The reason of choosing some region on the lower half is to
make sure that the models would not be able to see its classification during the training.
This region is shown on figure 4.5.
The whole process can be found on the notebook LoadWeights Training final image
WITH and WITHOUT LogisticRegression_up HTML.
4.1.1.1. The model created during this project.
After training our model with some random 16x16 pixels examples from the dataset of the
three images, the classification report of the model’s precision gave the results shown on
table 4.1.
accuracy 0.931111111111
Table 4.1: classification report on our model for the 1 st training
This table shows precision, recall and f1score resulting from the testing of 3000 examples
of forest, 3000 of field and 3000 of clouds. These kinds of measurements allow grading
the accuracy of a certain model (see section 2.2).
Figure 4.6 Figure 4.5: Region classified on the experiment “training with the upper half”
40
After the validation on the region shown on figure 4.5, the result was the one shown in
figure 4.6.
4.1.1.2. Logistic Regression
For the case of logistic regression, after the training with the same 9000 examples
employed on the previous section, the classification report of the model’s precision gave
the results shown on table 4.2.
precision recall f1-score support
accuracy 0.879444444444
Table 4.2: classification report on logistic regression for the 1 st training
After the validation on the same region of figure 4.5, the result was the one shown in
figure 4.7.
Figure 4.6: Result of the classification with our model for the 1 st training
41
Comparing the results for both cases, even though there is not much difference, at least
on the measurements values it is clear that the model we created gives better results in
this case of classification than the logistic regression.
4.1.2. Training with lower half
The second experiment consisted on trying the same thing as before but the other way
around. This time the training would be performed with the lower half of the three images
and then the classification it would take part on some region on the upper half of the
image. This region is shown on figure 4.8.
The whole process can be followed on the notebook LoadWeights Training final image
WITH and WITHOUT LogisticRegression_down HTML.
Figure 4.7: Result of the classification with logistic regression for the 1 st training
Figure 4.7 Figure 4.8: Region classified on the experiment “training with the lower half”
42
4.1.2.1. The model created during this project
After training our model with new random examples from the dataset of the three images,
the classification report of the model’s precision gave the results shown on table 4.3.
precision recall f1-score support
accuracy 0.957222222222
Table 4.3: classification report on our model for the 2 nd
training
These results are even better than the ones obtained before, on the previous experiment.
And so is the classification done with this model on the upper region shown on figure 4.8,
as it can be seen on the figure 4.9.
4.1.2.2. Logistic Regression
For the case of logistic regression, as seen on the table 4.4, the results are also better
than before, although are still worse than our model’s.
Figure 4.9: Result of the classification with our model for the 2 nd
training
43
Table 4.4: classification report on logistic regression for the 2 nd
training
The classification associated to this case is shown on figure 4.10.
As seen on the figure above, comparing that result to the classification performed by our
model there is a great difference, worse than on the previous experiment. However, the
measurement values have increased this time.
The reason why this can be happening is because the testing (and therefore the
measurements) is computed for the three images, while the validation only takes place on
one of them, which happens to be the one filled with more clouds and, therefore, more
likely to be wrongly classified.
4.1.3. Results
After the performance of these two experiments, we can conclude that the improvement
of our model in the classification is very noticeable, so it can make a difference using our
model instead of logistic regression model to classify the images.
precision recall f1-score support
accuracy 0.896555555556
Figure 4.10: Result of the classification with logistic regression for the 2 nd
training
44
4.2. Deforestation detection
Another one of the main goals of this project, as stated on section 1.1, was to improve the
deforestation detection on South America and extend it later to the whole tropics.
Since the project did not last very long, the extension to the whole tropics was impossible
to carry out. However, it left open the possibility of doing this in the future.
Nonetheless, at least the detection on South America could be performed, and therefore
this experiment was carried out.
To achieve a good and accurate detection, we used 4 images from the same place but on
different points in time. They were chosen from two different years (2013 and 2015) and,
on each year, from the two different tropical seasons11.
The point of choosing images from the two seasons was to achieve a good classification
on any time of the year.
4.2.1. Classification of all the images
In order to perform the classification, since the region examined was the same as the
previous experiments, we chose the model that gave better results on the tests, which is
the one trained with the lower half of the input images (see section 4.1.2).
We did this because the model chosen was trained both with dry and wet season images,
so that we could see if it was enough to perform the classification for both seasons.
The classification, separated between wet season and dry season, gave the following
results:
Figure 4.8: Classification for 2013 image from wet season
11https://en.wikipedia.org/wiki/Tropics
45
Figure 4.12: Classification for 2015 image from wet season
Figure 4.13: Classification for 2013 image from dry season
Figure 4.14: Classification for 2015 image from dry season
46
4.2.2. Detections
As it can be seen on the 4 classifications, the clouds are the hardest to detect, although
we introduced the class to avoid the quality problems existing on Terra-I before. However,
since these classifications by themselves are not going to show if there has been
deforestation between the years of 2013 and 2015, it might not be a problem.
More specifically, the reason why we decided to perform the detections on four images
instead of two was actually to solve the possible mistakes that could appear on the
classifications.
That is, if we only computed one detection (as the difference between both years) the
failures due to wrong classification could be much more likely than in the case that we
computed two.
For this reason, once the deforestation detection on both seasons was separately
performed, obtaining the results shown on figure 4.15 and figure 4.16, the difference
between these two was computed to reach the most accurate deforestation detection
possible.
As it can be seen in both figures, the detection consists on assigning the value 0 to all the
places that have the same value in both years and to all the places hidden by clouds, and
assigning 1 to the places where fields have appeared according to the difference
between both years’ classifications.
Notice that on the detections there are some failures due to clouds and forests that were
classified as field. If we only performed this first detection, its accuracy would have been
really low.
Instead of this, performing the difference between these two detections, although it would
give fewer deforestation fronts on the image, it would be more precise even if, as we
have seen, one of the images is full of clouds.
The figures 4.17 and 4.18 show this difference on the images from the year 2013 on top
and 2015 for the wet season. The figures 4.19 and 4.20 show the same for the dry one.
Figure 4.15: Deforestation detection wet season Figure 4.16: Deforestation detection dry season
47
Figure 4.17: Final detection on 2013 image for wet season
Figure 4.18: Final detection on 2015 image for wet season
48
Figure 4.19: Final detection on 2013 image for dry season
Figure 4.20: Final detection on 2015 image for dry season
49
After all the detections, as seen on the previous images, some of the real detections are
very accurate and, although the clouds still introduce some failures on them, the
improvement is very notable.
Notice that the 2013 image from the wet season is smaller than the other three. This is a
problem of the satellite image, so it is normal that there is detection where it should be
the end of the image.
The whole process can be found the notebook LoadWeights Training final image
227_65_HTML.
4.2.3. Global Forest Watch
Global Forest Watch12 is an interactive online platform designed with the aim of
monitoring forest and alert of new deforestation fronts.
It works with information from many organizations (including Terra-I) that periodically
update the deforestation detection.
This platform updates a Tree Cover Loss around the whole world every year with a very
high accuracy, but a year is too much time to react to new deforestation fronts. This is
why Terra-I’s main goal is to achieve an accuracy as high as that one but with an
updating every 16 days.
For this reason, to really observe the improvement that this project might introduce on the
Terra-I, the last step of this experiment was to compare our detections (on figure 4.21)
with the detections registered on by this yearly cover loss (shown on figure 4.22)
amplified to the same region of the image.
As it can be seen on the images, the detections are pretty much alike, despite some
differences probably due to the clouds of the images used on our detections.
12http://www.globalforestwatch.org
Figure 4.21: Our detection Figure 4.22: Tree cover loss detection
50
Notice that the region where the images differ from each other the most is actually the
region that covers the cloud on the 2015 image from the wet season.
Therefore, although is not perfect, the improvement on the detection despite the clouds is
quite notable, so without the clouds it can be even better.
Thus, after all we can conclude that the detection on South America is possible with this
new algorithm, even with cloudy images, with pretty good results.
51
5. Budget
In this project no physical prototype has been designed, so no components have been
needed. Therefore, this budget has been estimated regarding to the number of hours
dedicated to the thesis in the different work packages.
The budget is evaluated at a cost of junior engineer, which has been established at
8€/hour.
Task Weeks Mean number hours/week
Cost per hour (€)
Manual classification 1 20 8 160
Study of Neural Networks 4 20 8 640
Programming of the Neural Network
11 20 8 1760
Results and Final Report 4 25 8 800
TOTAL 20 22 - 4200€
6. Environment Impact
The main theme of this project is directly related to the environment impact. As stated on
section 1, the Terra-I project started as a solution to the rough habitat loss monitoring.
There exist other organizations that try to do the same as Terra-I, only that many of them
update their information about forest loss within a very large period of time, which many
times leads to belated reactions in front of new alerts.
Therefore, Terra-I and this project aim to be useful on habitat loss alerts by its updating of
information every 16 days, so that reactions to increased deforestation can be speeded
up by the government and environmental organizations.
53
7. Conclusions and future development
Regarding to the goals of the project, it started with the aim of improving the deforestation
detection on Terra-I by exploiting the additional information that the new satellite images
from landsat8 could give. Between them, for example, the 30m resolution in front of the
250m resolution from the satellite used until now.
As a result, the process could be improved by taking also into account the spatial
information of each pixel on the classification.
Also, a major goal was to find a solution to the clouds problem on Terra-I and, even
though our solution is not perfect yet, we can say that this was also tackled in with this
project.
Therefore, after all we could prove with this project the usefulness of the option of a new
spatial analysis of the images and we opened it to the possibility of carrying it out around
South America and, later, on the whole tropics.
Concerning the results of the experiments, the last point deserves special attention: the
comparison with the Tree cover loss project. This is because it is a very good detection
that takes place every year, so seeing that the detections produced with our model and
that one are similar, gives hope to achieve one day the same accuracy that they have
with this spatial analysis in South America, at least.
Related to the extension to the whole tropics, along the project we realized that it was
going to take much more time than we expected, so at the end it could not be performed.
This is because, due to the different vegetation that may be present in Africa, South
America and Indonesia, it may be very difficult to find a good model that could classify
images from the three continents.
However, at some time in this project we thought about how we could reduce the
complexity of such a problem that could involve great computation, and we actually found
an option. Since the vegetation patterns for the whole tropics are very similar to each
other, it could be a good start to use few images from all types of tropics vegetation and
try to detect the deforestation along that whole zone.
Finally, respecting the implementation on Terra-I, it takes much more time than we had to
do the project, so it is something that it can be done in future projects.
54
Bibliography
[1] L. Reymondin, A. Jarvis, A.Perez-Uribe, J. Touval, K. Argote, A. Coca, J. Rebetez, E. Guevara, M. Mulligan. “A methodology for near real-time monitoring of hábitat change at continental scales using MODIS-NDVI and TRMM”. Terra-I, an eye on habitat change, December 2012, pp. 4-9.
[2] M.I. Jordan, T.M. Mitchell. “Machine learning: Trends, perspectives, and prospects”. Science, vol. 349,
no. 6245, pp. 255-260, July 15, 2015. DOI: 10.1126/science.aaa8415.
[3] Y. LeCun, Y. Bengio, G. Hinton. “Deep learning”. Nature, vol. 521, pp.436-444, May 28, 2015. DOI:
10.1038/nature14539.
[4] A. Karpathy. “Convolutional Neural Networks (CNNs/ConvNets)”. [Online]. Available: http://cs231n.github.io/convolutional-networks/
[5] A. Krizhevsky. “Convolutional Deep Belief Networks on CIFAR-10”. August 2010.
[6] J. Rebetez. “Parasid, radar écologique pour l’Amazonie”. Bachelor thesis, Haute Ecole d’Ingénierie et de Gestion du Canton de Vaud, Switzerland, 2009.
[7] K. Georgy. “Terra-I on Earth Engine”. M.S. thesis, Haute Ecole d’Ingénierie et de Gestion du Canton de Vaud, Switzerland, 2012.
[8] “Band designations for landsat satellites: Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS)”. [Online]. Available: http://landsat.usgs.gov/band_designations_landsat_satellites.php
[9] “Landsat 8 bands”. [Online]. Available: http://landsat.gsfc.nasa.gov/?page_id=5377
[10] “How convolutional neural networks see the world”. [Online]. Available: http://blog.keras.io/how- convolutional-neural-networks-see-the-world.html
"""
`dtype` is a gdal type like gdal.GDT_Byte
`options` should be a list that will be passed to GDALRasterizeLayers papszO
ptions, like
["ATTRIBUTE=vegetation"]
ne):
"""
Loads the given shapefile with labelled polygon and rasterize it to an image
Args:
model_dataset: the rasterized label image will have the same shape as mo
del_dataset
label_fieldname: the shapefile attribute that contains the label of each
polygon
56
label2id: if not None, a dict mapping label name to id. This should cont
ain *all*
the name that will be encountered when loading this file (and maybe
some that will not
Returns:
labels : an uint8 masked array containing the label of each pixel
(unlabelled pixels are masked)
label2id : a dict mapping label name to id (the same as label2id if it w
as passed)
shape_layer.SetAttributeFilter(None)
shape_layer.ResetReading()
label2id[label] = lid
assert len(np.setdiff1d(loaded_label2id.keys(), label2id.keys())) == 0
# Use attribute filters to rasterize shapes with a given label one by one an
d assign a different lid
# to each
OUCHED'])