c 650 acta - university of oulujultika.oulu.fi/files/isbn9789526218519.pdf · professor jari juga...

UNIVERSITY OF OULU P .O. Box 8000 F I -90014 UNIVERSITY OF OULU FINLAND

A C T A U N I V E R S I T A T I S O U L U E N S I S

University Lecturer Tuomo Glumoff

University Lecturer Santeri Palviainen

Postdoctoral research fellow Sanna Taskila

Professor Olli Vuolteenaho

University Lecturer Veli-Matti Ulvinen

Planning Director Pertti Tikkanen

Professor Jari Juga

University Lecturer Anu Soikkeli


Publications Editor Kirsti Nurkkala

ISBN 978-952-62-1850-2 (Paperback)ISBN 978-952-62-1851-9 (PDF)ISSN 0355-3213 (Print)ISSN 1796-2226 (Online)

U N I V E R S I TAT I S O U L U E N S I SACTAC

TECHNICA


TECHNICA

OULU 2018

C 650

Anssi Kemppainen

ADAPTIVE METHODSFOR AUTONOMOUS ENVIRONMENTAL MODELLING

UNIVERSITY OF OULU GRADUATE SCHOOL;UNIVERSITY OF OULU,FACULTY OF INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING

C 650

AC

TAA

nssi Kem

pp

ainen

C650etukansi.fm Tuesday, March 13, 2018 2:42 PM

ACTA UNIVERS ITAT I S OULUENS I SC Te c h n i c a 6 5 0

ANSSI KEMPPAINEN

ADAPTIVE METHODS FOR AUTONOMOUS ENVIRONMENTAL MODELLING

Academic dissertation to be presented with the assentof the Doctoral Training Committee of Technologyand Natural Sciences of the University of Oulu forpublic defence on Tellus Stage, Linnanmaa, on 5 April2018, at 12 noon

UNIVERSITY OF OULU, OULU 2018

Copyright © 2018Acta Univ. Oul. C 650, 2018

Supervised byProfessor Juha Röning

Reviewed byAssociate Professor Fabio RamosAssistant Professor Giorgio Grisetti

ISBN 978-952-62-1850-2 (Paperback)ISBN 978-952-62-1851-9 (PDF)

ISSN 0355-3213 (Printed)ISSN 1796-2226 (Online)

Cover DesignRaimo Ahonen

JUVENES PRINTTAMPERE 2018

OpponentProfessor Wolfram Burgard

Kemppainen, Anssi, Adaptive methods for autonomous environmental modelling. University of Oulu Graduate School; University of Oulu, Faculty of Information Technologyand Electrical EngineeringActa Univ. Oul. C 650, 2018University of Oulu, P.O. Box 8000, FI-90014 University of Oulu, Finland

Abstract

In this thesis, we consider autonomous environmental modelling, where robotic sensing platformsare utilized in environmental surveying. In order to allow a wide range of different environments,our models must be flexible to the data with some a prior assumptions. Respectively, in order toguide action planning, we need to have a unified sensing quality metric that depends on theprediction quality of our models. Finally, in order to be able to adapt to the observed information,at each iteration of the action planning algorithm, we must be able to provide solutions that aim atminimum travelling time needed to reach a certain level of sensing quality. These are the maintopics in this thesis.

At the center of our approaches are stationary and non-stationary Gaussian processes based onthe assumption that the observed phenomenon is due to the diffusion of white noise, wherediffusion kernel anisotropy and scale may vary between locations. For these models, we proposeadaptation of diffusion kernels based on a structure tensor approach. Proposed methods aredemonstrated with experiments that show, assuming sensor noise is not dominating, our iterativeapproach is able to return diffusion kernel values close to correct ones.

In order to quantify how precise our models are, we propose a mutual information basedsensing quality criterion, and prove that the optimal design using our sensing quality provides thebest prediction quality for the model. To incorporate localization uncertainty in modelling, we alsopropose an approach where a posterior model is marginalized over sensing path distribution. Thebenefit is that this approach implicitly favors actions that result in previously visited or otherwisewell-defined areas, meanwhile, maximizing the information gain. Experiments support our claimsthat our proposed approaches are best when considering predictive distribution quality.

In action planning, our approach is to use graph-based approximation algorithms to obtain acertain level of model quality in an efficient way. In order account for spatial dependency andactive localization, we propose adaptation methods that map sensing quality to vertex prices in agraph. Experiments demonstrate the benefit of our adaptation methods compared to the actionplanning algorithms that do not consider these specific features.

Keywords: action planning, approximation algorithm, diffusion kernel, Gaussianprocess, localization, mutual information, non-stationarity, robotic sensing, sensingquality

Kemppainen, Anssi, Adaptiivisia menetelmiä autonomiseen ympäristönmallinnukseen. Oulun yliopiston tutkijakoulu; Oulun yliopisto, Tieto- ja sähkötekniikan tiedekuntaActa Univ. Oul. C 650, 2018Oulun yliopisto, PL 8000, 90014 Oulun yliopisto

Tiivistelmä

Tässä väitöskirjassa tarkastellaan autonomista ympäristön mallinnusta, missä ympäristön kartoi-tukseen hyödynnetään robottimittausalustoja. Erilaisia ympäristöjä varten, käytettävien mallientulee olla joustavia datalle tietyillä a priori oletuksilla. Mittausalustojen ohjaus vaatii vastaavastiyhtenäisen, mallien ennustuslaadusta riippuvan, kartoituksen laatumetriikan. Mukautuakseenuuteen informaatioon, ohjausalgoritmin tulee lisäksi pyrkiä joka iteraatiolla minimoimaan tietynkartoituksen laadun saavuttava kulkuaika. Nämä ovat tämän väitöskirjan pääaiheet.

Tämän väitöskirjan keskiössä ovat sellaiset stationaariset ja ei-stationaariset Gaussin proses-sit, jotka perustuvat oletukseen että havaittu ilmiö johtuu valkoisen kohinan diffuusiosta. Diffuu-siokernelin anisotrooppisuudelle ja skaalalle sallitaan paikkariippuvaisuus. Tässä väitöskirjassaesitetään näiden mallien mukauttamiseen rakennetensoripohjaisia menetelmiä. Suoritetut kokeetosoittavat, että esitetyt iteratiiviset mukauttamismenetelmät tuottavat lähes oikeita diffuusioker-nelien arvoja, olettaen, että sensorikohina ei dominoi mittauksia.

Mallien ennustustarkkuuden määrittämiseen esitetään keskinäisinformaatioon perustuva kar-toituksen laatumetriikka. Väitöskirjassa todistetaan, että optimaalinen ennustuslaatu saavutetaankäyttämällä esitettyä laatumetriikkaa. Väitöskirjassa esitetään lisäksi laatumetriikka, jossa poste-riori malli on marginalisoitu kartoituspolkujen jakauman yli. Tämän avulla voidaan huomioidapaikannusepävarmuuden vaikutukset mallinnuksessa. Tällöin etuna on se, että kyseinen laatu-metriikka suosii implisiittisesti sellaisia mittausalustojen ohjauksia, jotka johtavat aeimmin kar-toitetuille tai helposti ennustettaville alueille samalla maksimoiden informaatiohyödyn. Suorite-tut kokeet tukevat väittämiä, että väitöskirjassa esitetyt menetelmät tuottavat parhaan ennustusja-kauman laadun.

Mittausalustojen ohjaukseen tässä väitöskirjassa hyödynnetään graafipohjaisia approksimoi-via algoritmeja, jolla saavutetaan vaadittu mallin ennustuslaatu tehokkaasti. Väitöskirjassa esite-tään mukautusmenetelmiä kartoituksen laadun kuvaukseksi graafin solmujen kustannuksiksi.Tämän avulla sallitaan sekä spatiaalinen riippuvuus että aktiivinen paikannus. Suoritetut kokeetosoittavat esitettyjen mukautusmenetelmien edut suhteessa ohjausalgoritmeihin, jotka eivät huo-mioi näitä erityispiirteitä.

Asiasanat: approksimoiva algoritmi, diffuusiokernel, ei-stationaarisuus, Gaussinprosessi, kartoituksen laatu, keskinäisinformaatio, ohjaus, paikannus, robottimittaus

This thesis isdedicated to the nature and life inspiringme to learn and sense all their forms.

Preface

This dissertation is original, unpublished, and independent work by the author, A.Kemppainen.

9

Acknowledgements

Firstly, I would like to express my gratitude to my supervisor Prof. Juha Röning for hispatience and trust that helped me to finish my PhD studies. My sincere thanks alsogo to Dr. Janne Haverinen and my colleague Ilari Vallivaara who helped me with mypublications.

I would also like to thank my parents who have always expressed great respecttowards my research career. My biggest gratitude goes to Yarah who gave me a clearreason to finish these studies.

Oulu, 9th of November 2017Anssi Kemppainen

11

List of symbols and abbreviations

Mathematical notations

log(x) natural logarithm of x

exp(x) natural exponent of x

H(X) entropy of a random variable X w.r.t. the natural base e

H(X |Y ) conditional entropy of a random variable X, given Y , w.r.t. the natural

base e

Hm(X) entropy of a random variable X w.r.t to the base m

h(X) differential entropy of a random variable X

sn a set with n elements

|S| the size of a set S

E(X) expected value of a random variable X

cov(X ,Y ) covariance between random variables X and Y

Var(X) variance of a random variable X

∇ f (s) gradient of function f at s

f ∗ transpose (dual) of a vector f

A∗ transpose (adjoint) of a linear operator A

A∗ Hermitian adjoint of a linear operator A on a complex Hilbert space

F Fourier transform

FFT Fast Fourier Transform

I(X ;Y ) mutual information between random variables X and Y

A∪B union between sets A and B

A∩B intersection between sets A and B

A\B set difference, i.e., the subset of A not included in B

N (µ,Σ) multivariate normal distribution with mean µ and covariance Σ

N (x; µ,Σ) multivariate Gaussian density function of a variable x with mean µ and

covariance Σ

< x,y > inner product between x and y

f g function composition from g to f

X ∼= Y isomorphism between X and Y

Latin letters

F stochastic process, usually, assumed Gaussian process

13

f realization of a stochastic process

W measurement noise process

ZA random measurement vector for design A

zA measurement for design A

XA design matrix for design A

xi ith standard vector for design A

X standard basis

XA random sensing path for design A

xA sensing path for design A

uA sequence of actions for design A

M stochastic motion model

m motion model

G graph

V set of vertices

E set of edges

c(e) cost of edge e

c(p) cost of path p

N set of natural numbers

R set of real numbers

Greek letters

Ω sample set

ω sample

Ω white noise process

ω realization of white noise

σ f standard deviation of white noise

σn standard deviation of sensor noise

14

Contents

AbstractTiivistelmäPreface 9Acknowledgements 11List of symbols and abbreviations 13Contents 151 Introduction 17

1.1 Problem statement and the main contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.2 Environmental modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.2.1 Gaussian processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.2.2 Frequency domain Gaussian process regression . . . . . . . . . . . . . . . . . . . 22

1.2.3 Spatial moving average models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.2.4 Model adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.3 Sensing quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3.1 On entropy and mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3.2 On optimality of sensing designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.3.3 On complexity of sensing algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.4 Autonomous sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1.4.1 Sensing with adaptive environmental models . . . . . . . . . . . . . . . . . . . . . 34

1.4.2 Time-efficient sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

1.4.3 Sensing under positioning uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2 Model adaptation under sequential sensing 392.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.2 Diffusion kernel adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.2.1 Computing structure tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.2.2 Accounting for noisy samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.2.3 Working with unknown signal variation . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.3 Moving average kernel adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.3.1 Computing structure tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.3.2 Accounting for noisy samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

15

2.3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.4 Kernel adaptation under sparse sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

2.4.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 Quantifying sensing quality 83

3.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833.2 Mutual information criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3.2.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.3 Accounting for localization uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

3.3.1 Sensing path distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4 Action planning 1074.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.2 Design space for action planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3 Vertex price generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.4 Informative path planning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.4.1 Informative shortcutting heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

4.5 Path planning with sensing path uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . 1295 Conclusions 131

5.1 Model adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1315.2 Quantifying sensing quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1315.3 Action planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

References 135

16

1 Introduction

Life and the environment we are living in owe to the processes that maintain viabilityof our planet Earth. The same processes also change the environment constantly.Compared to the solar system, which on macro-scale can be predicted accurately onlong-term, whereas in nature, these processes are open systems interacting together andalso including a multitude of unobserved variables, often yielding to chaotic systems.Even though, some of these processes, for example surface layering, exhibit longertemporal stability, in general, we do not have enough historical information to predictlocally the spatial distribution of the phenomena (e.g. mineral deposits). On the otherhand, temporally slow terrain processes may also have included rapid changes due,for example, to volcanic activities. Meanwhile, some of the environmental processeshave attained dynamic equilibrium, which means that the entropy change of the spatialprocess is zero, they still usually exhibit spatial heterogeneity. Stable heterogeneityfollows from balanced external forces including the planet’s inner heat, gravity andmagnetic field as examples. Altogether, geostatistics try to solve the difficult task ofenvironmental modelling by utilizing statistical models with incomplete and noisyinformation.

The key idea behind geostatistics is that of spatio-temporal dependency of phenom-ena. The stronger the dependency the easier to model phenomena, since, fewer samplesare needed to predict values in a spatial/temporal scale. Positive dependency meansthat nearby values in space and time are more alike than those far away. Local positivedependency applies to the isolated systems and systems under diffusion processes, suchas isolated lakes, stable soil, indoor gases, and so forth, which exhibit local spatialhomogeneity and are usually close to dynamical equilibrium states. This thesis is basedon these assumptions in building efficient methods aiming at different autonomousenvironmental modelling applications.

Our needs arise from two real world applications: indoor magnetic field and lake-floor mappings. Magnetic field mapping inspires from the observations that steel andiron support in building structures creates anomalies for the geomagnetic fields that canbe utilized for indoor localization. Lake-floor mapping is, on the other hand, typicallyconducted to provide structural information, i.e. composition and vegetation of thebottom of a lake, or to estimate the water volume of a pond or lake. This information

17

helps environmental authorities to monitor water conditioning and to design renovationactions

Fig 1 and Fig 2 show a magnetic field map from a small lobby and a bathymetrymap from a bond, respectively. We observe spatial heterogeneity in these data, such that,some areas, especially close to the walls or middle of a bond, exhibit high variations,whereas, other areas are more flat. This suggests that, in these environments, statisticalmodelling would benefit from an ability to adapt spatial dependency locally to the data.That way, we could provide the same modelling quality with faster mapping time. Thisnaturally also means that the action planning must be able to adapt, on real-time, tothe new data. These are the motivations why this thesis focuses on adaptation of thestatistical model to different spatially and directionally dependent areas; quantificationof each possible sensing path in terms of predictive quality of the model; and design ofaction planning that is able, in real-time, to find a new, approximately best solutionleading to the desired predictive quality. In the following sections, we give, in moredetail, our problem statement with a brief overview to our contributions, followed byintroductions to each of these topics.

Fig 1. Indoor magnetic fields on a small lobby at the office building. From left, theranges are [-22,6] uT for x component, [-11,13] uT for y component, and [-16,-54]uT for z component.

18

Fig 2. Bathymetry data on a pond under a hydroelectric power-plant and bottomcomposition of a river, from left to right, respectively. Water depth ranges between0 and 3.5 metres, and composition from hard to soft. (Courtesy of AquamarineRobots).

1.1 Problem statement and the main contributions

Our objective in this thesis is to develop computational methods for autonomousenvironmental modelling where autonomy refers to the ability, in order to optimizemodelling task, to decide next actions based on the observed information. Withinthis framework, our focus is on model adaptation, model quality representation andtime-efficient action planning. These topics raise questions that have been captured inthis thesis.

First, given some data and a prior assumptions, how can we adopt the models tothe data? Our approach is based on diffusion kernel adaptation. We develop methodsfor both isotropic and anisotropic versions of stationary and non-stationary Gaussianprocesses. We demonstrate that this approach is also suitable for adaptive sequentialsampling where selection of the next sensing location is based on the model adoptedwith the newest data.

Second, given some statistical spatial model, how do we quantify sensing quality?Our approach is based on measuring the uncertainty related to the model characterizedby entropy. We prove that the mutual information between model and observationsis the most convenient sensing quality when considering prediction quality of themodel. When the sensing path is itself uncertain, we prove that using a model that ismarginalized over sensing paths gives a sensing quality that is able to account also for

19

the prediction quality of the environment, which is beneficial in action planning.Third, given a sensing quality, how can we, in computationally feasible time, find a

time-efficient sequence of actions that reach target information quality? Our approach isto utilize graph based approximate algorithms together with appropriate transformationsof sensing quality and travel-time to the vertex prices and edge costs of a graph. Wedevelop these transformations and demonstrate that adaptation of vertex prices providesfavorable results for exploration.

Tables 1 - 3 list our contributions in different research fields, and explain how theyrelate to and extend the state-of-the-art.

Table 1. Research on non-stationary environmental models.

Research question How to adapt anisotropic diffusion kernel parameters in a Gaussian processcontext?

State-of-the-Art Structure tensor based adaptation methods with a manually defined diffusionscale range.

Our contribution An exact structure tensor based adaptation method, which reducesthe dimensionality of the problem.

Table 2. Research on sensing qualities.

Research question What sensing quality criterion should be used to maximize the predictivequality of Gaussian processes?

State-of-the-Art Different entropy based criteria. No clear understanding how they relate andcompare in prediction quality of Gaussian processes.

Our contribution A Proof that mutual information between model and observations isa generalization of another commonly used criterion.

Our contribution B Proof that mutual information between model and observations providesthe best prediction quality when used in design.

Research question What criterion should be used to select actions that maximize the predictivequality of a model when also sensing path is uncertain?

State-of-the-Art Different criteria that balance between maximizing expected modelling qualityand minimizing path uncertainty.

Our contribution A Mutual information criterion between the model marginalized over sensingpaths and observations.

Our contribution B Proof that this criterion also implicitly minimizes sensing path uncertaintywhen necessary for modelling.

20

Table 3. Research on informative action planning.

Research question How to minimize sensing time to attain certain predictive quality of the model?State-of-the-Art Approximate graph-based algorithms when sensing locations are mutually

independent.Our contribution A Algorithm to create approximately independent sensing locations, which

enables to utilize graph-based methods in informative path planning.Our contribution B Informative shortcutting heuristic to improve the approximate result.

1.2 Environmental modelling

We assume that we can observe our target environment directly with additive sensornoise. As such, we can define our measurement model through

z(s) = f (s)+w(s)

where z(s) is a noisy measurement, f (s) an unknown variable and w(s) is a sensor noiseat location s. Without some additional assumptions, given a measurement at locations does not provide us any additional information related to the unknown variable atlocation t. In spatial statistics, the role of the spatial dependency is to tight values fromdifferent locations into some reasonable limits. As stated by Cressie [1], this is becauseof the nature of the data, not due to the modelling aspects. As such, we might say thatthe dependency is a natural way to model diffusion processes found in nature. Oneof the most common and oldest way to model any geostatistical processes have beenGaussian processes. The benefit is that the posterior distribution

p( f (t)|Z(s) = zs) =∫

p( f (t), f (s)|Z(s) = zs)d f (s)

conditioned on observation zs can be solved through linear equations. Also, as statedby Janyes [2], the normal distribution for continuous random variables, is the mostconvenient way to model unknown a prior knowledge with only some clue about thevariation. For these reasons, we have also selected the Gaussian process framework forour environmental models.

1.2.1 Gaussian processes

Gaussian processes can be characterized through mean and covariance functions. Theseare the function space equalities to the mean vector and covariance matrix. If we assert

21

that any normal distribution can be standardized, in function spaces this means that

cov(F(s),F(t)) = E[(K(s)Ω)(K(t)Ω)∗]

= E[(∫

K(s,u)Ω(u)du)(∫

K(t,v)Ω(v)dv)∗]

= σ2f

∫K(s,u)K(t,u)∗du, (1)

where Ω is a Gaussian white noise process with some variance σ2f , and K is a linear

operator with linear functionals K(s) and K(t) transforming white noise to some basiss and t, respectively. That way. we can observe that the the covariance function issimply a generalization from countable vector spaces to the function spaces with therequirement of positive semi-definiteness following from the squared form

cov(F,F) = σ f KK∗.

Since the requirement for valid covariance functions is clear, these are commonlygiven explicitly with known properties. In this thesis, we have utilized two typesof covariance functions. One of them, known as a Gaussian covariance function, isformed by convolving a Gaussian white noise with a Gaussian kernel. This is equivalentof saying our observations are due to some unknown initial spatial concentrations(distributed according to Gaussian white noise) that has diffused over some time, definedby a model parameter. The other covariance function is a non-stationary version of this,such that, the diffusion process may have different time scales at each location.

Now, given both mean and covariance functions, in order to build predictivedistribution at some basis, we can proceed with the traditional conditional equations

E[FS|ZT = zT ] = E[FS]

+ cov(FS,ZT )cov(ZT ,ZT )−1(zT −E[FT ])

and

cov(FS,FS|ZT ) = cov(FS,FS)

− cov(FS,ZT )cov(ZT ,ZT )−1 cov(ZT ,FS),

where S are some n predictive locations, and T m measurement locations.

1.2.2 Frequency domain Gaussian process regression

The major disadvantage of the conditional predictive equations follows from the O(n3)

computational complexities, where n is the number of observations used in prediction.

22

Frequency domain approach is based on assumption that our covariance function isformed due to the convolution with white noise process, i.e.

cov(F(s),F(t)) = E[(∫

K(s−u)Ω(u)du)(∫

K(t− v)Ω(v)dv)∗]

= σ2f

∫K(s−u)K(t−u)∗du

= σ2f

∫K(v)K(v+ r)∗dv,

where r = t− s.We observe that the covariance is independent of the locations s and t,known as stationarity property. The key point in frequency domain Gaussian processregression is that if we reduce the number of frequency components, we allow theenvironment to repeat itself beyond the area of interest, see e.g. [3]. This naturally leadsto more efficient regression. Frequency domain regression equations are now given by

E[Fξ |ZT = zT ] = E[Fξ ]+ |V |(Q∗T QT + |V |σ2wΛ−1)−1

· Q∗T (zT −E[FT ]),

for the predictive mean and

cov(Fξ ,Fξ |ZT ) = |V |2σ2w (2)

· (Q∗T QT + |V |σ2wΛ−1)−1

for the predictive covariance, where ξ are some frequency components, Λ = cov(Fξ ,Fξ )

is a diagonal matrix of a prior process covariance at the frequency domain, σ2w =

cov(W,W ) is a scalar sensor noise variance, |V | is the volume of a spatial domain underinterest, and QT is the change of basis matrix consisting of frequency components ξ

at locations T . Throughout this thesis, in order to speed up computations, we utilizeFourier domain Gaussian process regression in experiments, where applicable. Thismeans the cases where Gaussian processes are stationary.

1.2.3 Spatial moving average models

Our approach to model non-stationarity over our exploration area is to utilize theapproach by Higdon et al. [4], also known as a spatial moving average model, see [1,Section 4.1.4]. If we return back to the equation (1) and state that Ks(u) = K(s,u) andKt(u) = K(t,u) are both different convolution kernels, we end up with the definition of

23

Higdon et al. by

cov(F(s),F(t)) = σ2f

∫K(s,u)K(t,u)∗du

= σ2f

∫Ks(s−u)Kt(t−u)∗du. (3)

For certain types of convolution kernels, the moving average covariance can be attainedin analytic form. In his thesis, Paciorek [5] derived moving average covariance functionsfor different locally stationary processes in multiple dimensions. When the localconvolution kernel is an anisotropic Gaussian, i.e.

Ks(u) = |2πΣs|−0.5 exp(−0.5uT

Σ−1s u),

where Σs is positive semidefinite, from equation (3), we find out

cov(F(s),F(t)) = σ2f (2π)0.5n(|Σs||Σt ||Σ−1

s +Σ−1t |)−0.5

· exp(−0.5(s− t)T (Σs +Σt)

−1(s− t)),

where n is the dimension of input space. The remaining question is how to adapt Σs torepresent local diffusion when we have a restricted number of observations only?

1.2.4 Model adaptation

With the selected mean and covariance functions, Gaussian process regression isstraight-forward. However, without previous knowledge, the selection of appropriatemodel parameters to explain short- and long-term variation among data is challenging.We might ask what part of the variation can be explained by noise? How do we know ifcorrelation between locations is due to the spatial dependency or longer-term trend? Arethe spatial dependency translation and rotation invariant? These aspects can be studied ifwe have enough observations from the target environment. The commonly used methodis to evaluate marginal measurement likelihood

p(zA|θ) =∫

p( f |θ)p(zA| f ,θ)d f

with respect to different parameters θ . However, if the number of observations is smallcompared that of parameters, which is the usual case with anisotropic spatial movingaverage models, finding correct parameters is a challenge. One option is to assume thatparameters go under some smooth spatial process, as in [6, Section 11.1]. However,

24

how can we define that process, since it is also essentially related to the observations?Another option is to adapt parameters based on some local pattern, as in [7], [8]. Thelatter one is less vulnerable to some a prior assumptions, however, how do we guaranteesuch an approach reaches correct parameter values? This is considered in Chapter 2.

1.3 Sensing quality

Sensing quality defines how well our observations can explain the phenomenon underinterest. For example, in water quality monitoring, in order to follow temporal trends,sampling may be performed at selected sites with certain time intervals. Sensing qualityshould then define the ability to generalize local trends to cover the whole lake; thebetter sensing quality, the more we can trust these observations are able to assess thecondition of the lake.

From the modelling aspect, sensing quality is related directly to the uncertaintyof the spatial model; the less uncertain our model, the better it can predict values, ofcourse. In order to understand the concept of uncertainty, and to motivate our approachfor sensing quality, we give a brief review of the definition of entropy and mutualinformation. Furthermore, we show how mutual information based sensing qualityrelates to the optimality of the sensing designs and to the approximation algorithms.

1.3.1 On entropy and mutual information

Assume that we have a random variable (or more generally a stochastic process) F witha distribution PF defined on some sigma-algebra Σ. Next, let’s assume we have a set of n

observations drawn according to this distribution that fall into m≤ n disjoint sets in asigma-algebra. Now, the number of different sequences that would result in the samedistribution is given by

|sn|=n!

n1! · · ·nm!,

which in statistical mechanics corresponds on defining how many different microstateswould result in the same observed macrostates [9]. Taking logarithms and using Stirling’s

25

approximation we find out

log(|sn|) = log(n!

n1! · · ·nm!)

= log(n!)− log(n1!)− . . .− log(nm!)

≈ (n log(n)−n)− (n1 log(n1)−n1)− . . .

− (nm log(nm)−nm)

= n log(n)− ∑i=1:m

ni log(ni)

and now stating the sampled distribution converge to the true, we have

limn→inf

log(|sn|) = n log(n)− ∑i=1:m

Pin log(Pin)

= n log(n)− ∑i=1:m

Pin log(n)

− ∑i=1:m

pin log(Pi)

= −n ∑i=1:m

Pi log(Pi),

which is the familiar information entropy. Now, we can express this in a compressedspace within shorter sequences snc such that

mnc = |sn|

⇔ nc log(m) = log(|sn|)

⇔ nc =log(|sn|)

log(m)

=−n∑i=1:m pi log(Pi)

log(m),

which is the general lower bound for sequence encoding. Information entropy isexpressed as the average achievable compression in a sequence, i.e.

Hm(F) =nc

n

=−∑i=1:m Pi log(Pi)

log(m).

26

However, if we instead compress the number of representative sets mc, we find

mcn = |sn|

⇔ n log(mc) = log(|sn|)

⇔ log(mc) = −1n

n ∑i=1:m

pi log(Pi)

⇔ mc = exp( ∑i=1:m

Pi log(Pi)), (4)

which was used in [10] to approximate how well different models suit for localization.This describes the average number of different values required to represent distribution.

We notice the entropy depends on the elements s1, . . . ,sm of sigma-algebra withassociated probability measures pi = P(σi). Now, let’s state we have a continuousrandom variable F : Ω→ R with the Borel sigma-algebra. Let’s also consider that wehave a density function

p( f ) =dP(F(ω)< f )

d(λ (F(ω)< f )),

where λ is Lebesgue measure, notice λ (F(ω)< f )) = λ ([− inf, f ]). For simplicity, weuse later notation λ ( f ) := λ (F(ω)< f )). Now, if we give the (4) by

mc = exp(−∫

dP(F(ω)< f ) log(dP(F(ω)< f )))

= exp(−∫

p( f )d(λ ( f )) log(p( f )d(λ ( f )))

= exp(−∫

p( f )d(λ ( f )) log(p( f )d(λ ( f )))

= exp(−∫

p( f )d(λ ( f )) log(p( f ))exp(− log(d(λ ( f ))))

=1

d(λ ( f ))exp(−

∫p( f )d(λ ( f )) log(p( f )),

which naturally results in infinite values. This simply says that for Borel sigma-algebrawe can select extremely small open intervals resulting in any case to the infinite numberof sets. In practical cases, we are more interested in how large a region from the spacethe observations cover. This is given by

rc = mc ∗d(λ ( f ))

= exp(−∫

p( f )d(λ ( f )) log(p( f )),

where the exponential term is the familiar differential entropy. As such, we state that thedifferential entropy h(F) = log(rc) is a natural metric for continuous variables.

27

Now, let’s assume we are given a set observations zA and we are willing to know howmuch information they would provide for our stochastical process F . We can furtherassume that these observations are from known locations A. The entropy over posteriorprocess is now given by h(F |ZA = zA) and the conditional entropy over all possibleobservations is given as an expected value h(F |ZA) =

∫h(F |ZA = zA)p(zA)d(λ (zA)). In

[11], [10], in order to measure how much sensing at locations A would decrease theposterior entropy of our stochastical process, we used mutual information I(F ;ZA) =

h(F)−h(F |ZA). We notice that this characterizes the ratio of space coverage decreasesuch that

ratio = exp(I(F |ZA))

=exp(h(F))

exp(h(F |ZA))

=m(F)∗d(λ ( f ))

m(F |ZA)∗d(λ ( f ))

=m(Y )

m(Y |ZA)

=exp(H(F))

exp(H(F |ZA)),

which means that that the mutual information I(F |ZA) = h(F)−h(F |ZA) = H(F)−H(F |ZA) tights Shannon entropy and differential entropy to the same metric.

1.3.2 On optimality of sensing designs

Selection of sensing locations can be viewed as a design problem

ZA = XAF +XAW,

where XA = (x1, . . . ,xn)T is a design matrix such that each xi belongs to standard

basis X , F is our Gaussian process marginalized to some finite basis, W is Gaussianmeasurement noise at the same basis with variance matrix E[WW ∗] = σ2

n I, and ZA is arandom measurement vector for design A. Now, the optimal estimator for the normallinear regression, is simply the posterior distribution with a mean

E[F |ZA = zA] = E[F ]−E[FZ∗A]E[ZAZ∗A]−1(zA−E[F ])

= E[F ]−E[FF∗]XTA(xA E[FF∗]XT

A +σ2n IA)

−1(zA−E[F ])

28

and covariance

E[FF∗|ZA] = E[FF∗]−E[FZ∗A]E[ZAZ∗A]−1 E[ZAF∗]

= E[FF∗]−E[FF∗]XTA(XA E[FF∗]XT

A +σ2n IA)

−1XA E[FF∗]. (5)

Using some algebraic manipulations, for the posterior covariance (5), we find out

E[FF∗|ZA] = X−1A σ

2wI(XA E[FF∗]XT

A +σ2n I)−1XA E[FF∗]

= σ2n X−1X−T

A (E[FF∗]+σ2n X−1

A X−T )−1 E[FF∗]

= (σ−2n XT

AXA +E[FF∗]−1)−1. (6)

As was stated by Chaloner and Verdinelli [12], if we let in (6) a prior covarianceE[FF∗] be large compared to the OLS variance σ2

n (XTAXA)

−1, the posterior covarianceis dominated by the selection of design matrix. Such an assumption would move usfrom Bayesian to the traditional non-Bayesian optimal designs.

Now, given the model posterior distribution, we can define some metrics thatquantify, for each design, the uncertainty related to the model. A common approach is anexpected information gain that observations can provide for the model. This is given by

EZA [h(F)−h(F |ZA = zA)] = h(F)−h(F |ZA),

which is observed to be the mutual information between the process F and the obser-vations ZA. Another common metric is to measure the expected information gain theobservations ZA provide for the predictions ZB with design B. This is given by

EZA [h(ZB)−h(ZB|ZA = zA)] = h(ZB)−h(ZB|ZA),

whereZB = XBF +XBW.

As was explained in Section 1.3.1, the mutual information simply means the relativeincrement in information. Compared to the posterior h(F |ZA = zA), h(ZB|ZA = zA) orconditional entropies h(F |ZA), h(ZB|ZA), this is less sensitive to discretization or modelstructure (different parameters).

Fig 3 demonstrates how different designs relate to the sensing quality. In thisexample, each design has the same amount of sensing locations, however, the samplinginterval varies. We observe that the mutual information sensing criterion favor designsthat provide small average posterior uncertainty under the area of interest. Notice

29

that the mutual information is computed in the frequency domain where frequencycomponents are selected such that the process repeats itself outside of the area of interest.This provides the same mutual information value as in the spatial domain, however,it is computationally more stable. In the spatial domain among nearby locationswe face up a problem with conditional entropies limδx→0 h(F(x)|F(x+ δx),ZA) ≈h(F(x)|F(x+δx))→−∞, even though they cancel each other in mutual informationcomputation.

30

(a) Sampling interval = 2.

(b) Sampling interval = 0.5.

Fig 3. Sensing qualities for different designs. Blue line represent true processf , red stars samples, black line posterior mean, dark gray area 95th percentile ofa posterior distribution, and light gray are 95th percentile of a prior distribution.The Gaussian process follows from the convolution of a Gaussian white noisewith variance s2

f = 1, and a Gaussian kernel K(u) = (2π)−0.5 exp(−0.5u2).

31

(c) Sampling interval = 1.

Fig 3. Sensing qualities for different designs. Blue line represent true processf , red stars samples, black line posterior mean, dark gray area 95th percentile ofa posterior distribution, and light gray are 95th percentile of a prior distribution.The Gaussian process follows from the convolution of a Gaussian white noisewith variance s2

f = 1, and a Gaussian kernel K(u) = (2π)−0.5 exp(−0.5u2) (cont.)

Now, in order to work with positioning uncertainty, we can consider uncertainties ofdesigns. Since for any xi, xi f is a projection, such that f (x) = xi f for some location x,we can give an isomorphism

XA ∼= xA, (7)

where xA is a sensing path. As such, the uncertainty over designs can be given throughrandom sensing path XA. For the sensing quality, we now observe two alternatives; thefirst one is to consider expected mutual information over sensing paths when the sensorinformation is not used in positioning, and the second is to include also positioning inthe sensing quality metrics. For the former, we observe

h(F |XA)−h(F |ZA,XA) =∫

h(F |XA = xA)−h(F |ZA,XA = xA)p(xA)dxA

=∫

h(F)−h(F |ZA,XA = xA)p(xA)dxA,

where h(F |XA) = h(F) simply means that without sensing, design itself does not improve

32

our knowledge about the environment. For the latter, we simply have h(F)−h(F |ZA)

that uses marginalized density

p( f |zA) =∫

p( f |zA,xA)p(xA|zA)dxA,

where p(xA|zA) is the posterior sensing path distribution. Next, we can show someoptimization bounds that connect these two metrics. Assuming discrete sensing pathdistribution, for the lower bound of sequential learning, in Section 3.3, we find out

h(F)−h(F |ZA)≥ h(F |XA)−h(F |ZA,XA)−h(XA|ZA),

and, respectively, for the upper bound we find out

h(F)−h(F |ZA)≤ h(F |XA)−h(F |ZA,XA).

Similar approaches were presented for parameter distributions by Krause in [13, Chapter14], which is out of the scope of this thesis. We always naively assume that currentparameters, even obtained through adaptation, are correct.

1.3.3 On complexity of sensing algorithms

In optimization theory, NP refers to the problems whose solutions can be verified in apolynomial time, i.e., is of the complexity O(nk) where n is the number of inputs. Sucha verifier is commonly called as an oracle. If the optimal solution can be returned inpolynomial time, then we call these P problems. If every NP problem can reduced intosome problem in polynomial time, then that problem is called NP-hard. In practicethis means that for every NP problem we can find a solution in polynomial time byusing an oracle of an NP-hard problem. This naturally means that none of the NP-hardproblems are P problems. If we additionally require that the NP-hard problem is also aNP problem, then we call such a problem NP-complete.

For the sensing problems where we try to find the optimal set of sensing locations weusually work with NP-hard problems. We observe two practical cases: to find a smallestcost subset of locations that provides a certain sensing quality; and to find k-subset oflocations that provides maximum sensing quality. These both can be observed NP-hardproblems. If we assume the mutual information in Section 1.3.2 is a non-decreasingsubmodular set function, as will be proved in Section 3.2, then there, fortunately, existgreedy approximation algorithms for both, given by [14] and [15].

33

1.4 Autonomous sensing

With autonomous sensing, we refer to the action planning that, based on the recentobservations, is able to decide next actions in order to optimize sensing quality. Thisdiffers from sensor placement or survey sampling problems in the ability to improvedesigns while sensing. In the simplest case, we could, in advance, select a sequenceof sensing locations and preplan required actions to move between these locations.However, in many cases, positioning accuracy may restrict the applicability of thisapproach: selected actions may not lead to the desired sensing locations, and if thesensor information is used to improve positioning accuracy it may be necessary torevise actions to provide a qualified model. In addition, if the environmental models areupdated while sensing, this might remarkably change action planning. These differentscenarios are introduced in the following subsections.

1.4.1 Sensing with adaptive environmental models

Our environmental model introduced in Section 1.2 is based on Gaussian processeswhose a prior and posterior entropies conditioned on parameters are given by

h(F |Θ = θ) =n2(1+ log(2π))+

12

log(det(cov(F,F |Θ = θ)))

and

h(F |ZA,Θ = θ) =n2(1+ log(2π))+

12

log(det(cov(F,F |Θ = θ ,ZA))),

respectively, where A refers to the design of sensing locations and n is the dimensionalityof F . These entropies constitute the mutual information sensing quality function,introduced in Section 1.3.2 by

h(F |Θ = θ)−h(F |ZA,Θ = θ),

where we have highlighted the dependency on the parameters.If the parameters are independent of the measurements, then we can preplan

all actions, since a prior and posterior covariances are conditionally independent ofobservations, given parameters. This is, however, not the most convenient way unlesswe know our selected parameters represent the environment well. The simplest way toenable flexibility for modeling and action selection is to utilize after each action the best

34

describing parameters for the observations. This was used by the authors in [8] forautonomous sensing. A more sophisticated approach would be to utilize conditionalentropies, such that for the sensing quality we would have,

h(F |Θ)−h(F |ZA,Θ) =∫

h(F |ZA,Θ = θ)p(Θ = θ |ZA = zA)dθ

−∫

h(F |Θ = θ)p(Θ = θ |ZA = zA)dθ

that was used by Singh in [16] for adaptive path planning. As was explained by Krausein [13, Chapter 14], this serves as an upper bound for the actual mutual information

h(F)−h(F |ZA),

where p( f |zA) =∫

p( f |zA,θ)p(θ |zA)dθ is the actual, typically non-Gaussian, posteriordistribution for the model.

Assuming all possible sensing locations constitute the basis of F , for all theseadaptive approaches holds also non-decreasing submodular property. As such, withoutany additional constraints, we can utilize greedy approach with near-optimal results.However, if we optimize sensing collectively with respect to the time and information,different approaches, such as in [16, 17], must be considered. This is explained in thenext section.

1.4.2 Time-efficient sensing

If we consider exploration time, the greedy selection is not very practical for actionplanning, since, the greedy sampling points are, in general, far away. This means thatthere would be midpoints that we could also sample to provide additional informationincrement. Consider the following constrained information maximization problem

maxA

h(F)−h(F |ZA)

subject to T (A)≤CT ,

where A is again the design of sensing locations, CT is a time constraint, and T maps A

into the travel time. Given a start and end locations, this can be, in general, stated as anorienteering problem [18]. In mapping cases, we may be more interested to collecta certain level of accuracy for the map in a minimal time. As such, the optimization

35

problem is given by

minA

T (A)

subject to h(H)−h(F |ZA)≥CI ,

where CI is an information constraint. This can be stated, as quota stroll, quota tour orquota TSP problem, depending on if we are given a start and end locations, just a startlocation or that the solution forms a cycle, respectively.

These both approaches are extremely challenging both of the additional constraintsand sequential order. Since they can be reduced from the travelling salesman problem,they are NP-hard. An additional challenge is posed if sensing locations are mutuallydependent, which is the typical case in environmental modelling. For these cases, oneoption would be to try to decompose the environment into approximately independent re-gions, see [16], and utilize some known constrained TSP algorithms to find approximatesolutions, such as [19], and [20]. For the former problem, assuming that the sensingquality is submodular, one could also utilize existing orienteering algorithms, see [17],[21], known to be quasi-polynomial in running time. Also if the search space can beheavily bounded, one could use search algorithms to find optimal [22] or asymptoticallyoptimal [23], [24] solutions. Finally, if the travel path optimality is not an issue, onecould use simple greedy heuristics to find feasible solutions, see [25].

1.4.3 Sensing under positioning uncertainties

So far we have considered autonomous sensing where each action leads to a predefinedlocation. Such an assumption can be taken if we can rely on some accurate positioningsystem (e.g. satellite, laser scanner or camera based systems) guaranteeing positioninguncertainty has only a little effect on the modeling accuracy. When the positioninginaccuracy has a remarkable impact on the modeling this should be considered as well.In addition, if the collected sensor information is used in positioning, our action selectionmust ensure positioning uncertainties do not corrupt modeling quality. Let’s assume, asin Section 1.3.2, that the sensing path is given by a stochastic process XA, where A is thedesign. Now, the model distribution conditioned on measurements is given by

p( f |zA) =∫

p( f ,xA|zA)dxA (8)

and if we assume that the joint distribution is Gaussian then we can analyze posteriormodel distribution efficiently in a closed form. Another commonly used approach is to

36

utilize Rao-Blackwellized approach where a finite number of model and sensing pathpairs ( f ,xA) are used to represent the joint distribution.

If we can factorize the joint distribution through

p( f ,xA|zA) = p( f |xA,zA)p(xA),

then we are just dealing with inaccurate sensing locations that can’t be improved whilesensing. In that case, we can apply previously described approaches in exploration.However, if the joint distribution is factorized by

p( f ,xA|zA) = p( f |xA,zA)p(xA|zA),

then we are working with the simultaneous localization and mapping (SLAM) problem.The benefit is that we can improve positioning by active sensing. For the exploration,this means that at each iteration i, based on the current knowledge of the environment,we can try to find the best actions A giving us the sensing quality

h(F)−h(F |ZA),

where model distribution p( f |zA) is obtained through marginalization, see equation(8). If we consider purely exploration, optimizing against the above sensing quality iswell argued. However, it is not clear whether optimizing purely against sensing qualitywould also result in low level in positioning uncertainty, and that is where we proposeconfirmation in Section 3.3. Another solution would be to consider joint distributionand mutual information over that, see, e.g. [26], [27]. That is the traditional SLAMapproach, which balances between exploration and exploitation.

37

2 Model adaptation under sequential sensing

We consider a problem, given some sequence of observations zA, of how to estimatelocal smoothing kernel parameters? For the moving average models, the marginallikelihood is given by

p(zA|θ) =∫

∏s∈xA

p(z(s)|θ ,ω)p(ω|θ)dω (9)

with

p(z(s)|θ ,ω) =N (z(s); f (s),σ2n ),

p(ω(du)|θ) =N (ω(du);0,σ2f du)

and f (s) =∫

Ks(s−u)ω(u)du with local kernel Ks from equation 3. Following from thenormal distribution, the marginal likelihood (9) is now given by

p(zA|θ) =N (zA;0A,σ2f KAK∗A +σ

2n IA),

which we can maximize against parameters θ . Now, we observe that the parameterestimation problem is related to the two unknown variance parameters σ2

f and σ2n ,

and kn : k ≥ 1 local kernel parameters. This means that we are working with a highlyunderdetermined system, such that the same observations can be explained by differentselections of parameters. One solution is to tie parameters under some common modelsuch that the number of parameters can be reduced , see e.g. [6], [28]. Assuming thelocal kernel is anisotropic Gaussian, another option is to adapt each local convolutionkernel independently, using structure tensor based adaptation, see e.g. [7], [29]. Both ofthese approaches are challenging if one has to do parameter estimation in a sequentialmanner. In the following sections, we propose our approach based on exact adaptation ofanisotropic diffusion kernels. The benefit is that we do not need to define any additionalmodel or adaptation boundaries.

2.1 Related work

Structure tensor based diffusion kernel adaptation states back to the shape adaptation incomputer vision where the objective was to find a structure descriptor that is invariantto different scales and affine transformations. Lindberg and Garding [30] proposed

39

an iterative approach where at each iteration nonisotropic Gaussian kernel shapeparameter (symmetric matrix) was updated with the inverse of a local structure tensor.Middendorf and Nagel [29] adopted this to the image flow estimation together withsome restricting learning trade-off parameters. Lang et al. [7] used a similar approachto adapt scale hyperparameter in context of nonstationary moving average squaredexponential covariance functions. A similar approach was also utilized in [8] for sparsesample sets, even though, instead of structure tensors, the authors utilized expected outerproduct of local gradients computed using different sparse data points.

Compared to all those anisotropic diffusion kernel adaptations, that only deal withrotation of anisotropic scale parameters and relative scale between smallest and largestscale parameters, our approach estimates the true scale based on the exact connectionbetween local structure and anisotropic Gaussian kernel.

2.2 Diffusion kernel adaptation

We start connecting local structure to the anisotropic kernel by introducing expectedgradient outer product

E[∇ f (s)(∇ f (s))∗] =∫ ∫

(s−u)(s− v)∗Σ−2

((2π)d |Σ|)−1 exp(−0.5(s−u)∗Σ−1(s−u))

exp(−0.5(s− v)∗Σ−1(s− v))E[ω(u)ω(v)dudv] (10)

= σ2f

∫(s−u)(s−u)∗Σ−2

((2π)d |Σ|)−1 exp(−(s−u)∗Σ−1(s−u))du,

where we have assumed diffusion parameter Σ is location invariant, and for white noiseω we have E[ω(u)ω(v)dudv] = σ2

f du for all v = u, and 0 otherwise. Integrating over u

we find out

E[∇ f (s)(∇ f (s))∗] = σ2f π−d/22−d−1|Σ|−1/2

Σ−1,

and multiplying both sides with itself, we get

E[∇ f (s)(∇ f (s))∗]E[∇ f (s)(∇ f (s))∗] = σ4f π−d4−d−1|Σ|−1

Σ−1

Σ−1,

which leads to

|Σ|ΣΣ = σ4f (π

n4d+1 E[∇ f (s)(∇ f (s))∗]E[∇ f (s)(∇ f (s))∗])−1. (11)

40

Now, due to the positive definiteness, we can state Σ = UΛU∗, which gives, whendimensionality d = 2,

|Σ|Σ2 = λ1λ2UΛ2U∗ =UDU∗,

where

Λ =

[λ1 00 λ2

],

U =[

u1 u2

]and

D =

[d1 00 d2

].

Finally, by solving eigenvalues d1, d2 and eigenvectors u1, u2, we get λ1 = (d31/d2)

1/8

and λ2 = (d32/d1)

1/8.Next, we need to estimate the expected gradient outer product. For that, we can

utilize ergodicity of a white noise process such that

E[ω(u)ω(v)dudv] =1|X |

∫h∈X

ω(u+h)ω(v+h)dhdudv,

and substituting this to (10) together with the observation E[ω(u)ω(v)dudv] = 0 whenv 6= u we have

E[∇ f (s)(∇ f (s))∗] =1|X |

∫h∈X

∫(s−u)(s−u)∗Σ−2

((2π)d |Σ|)−1 exp(−(s−u)∗Σ−1(s−u))

ω(u+h)ω(u+h)dudh.

Next, substituting v = u+h, we have

E[∇ f (s)(∇ f (s))∗] =1|X |

∫h∈X

∫(s+h− v)(s+h− v)∗Σ−2

((2π)d |Σ|)−1 exp(−(s+h− v)∗Σ−1(s+h− v))

ω(v)ω(v)dvdh.

41

Finally, substituting t = s+h, with Y = s+h : h ∈ X , and |Y |= |X |, we have

E[∇ f (s)(∇ f (s))∗] =1|Y |

∫t∈Y

∫(t− v)(t− v)∗Σ−2

((2π)d |Σ|)−1 exp(−(t− v)∗Σ−1(t− v))

ω(v)ω(v)dvdt

=1|Y |

∫t∈Y

∇ f (t)(∇ f (t))∗dt (12)

that is recognized as a structure tensor Sg(s) =∫

t∈Y g(s− t)∇ f (t)(∇ f (t))∗dt whereg(s− t) = 1/|Y |. For noise filtering purposes, we can describe this in the Fourier domainby

F(Sg)(ξ ) = δ (ξ )∫(2πi(ξ −ν))(2πiν)∗F( f )(ξ −ν)F( f )(ν)dν ,

where δ (ξ ) is Kronecker delta, and we have utilized the fact that the convolution area isrestricted to size |Y |. Now, the above equation means that we have values only at ξ = 0,such that

F(Sg)(0) =∫(2πν)(2πν)∗F( f )(ν)F( f )(ν)dν , (13)

where F( f )(ν) is complex conjugate.

2.2.1 Computing structure tensor

In order to find a solution for equation (11), we need to approximate E[∇ f (s)(∇ f (s))∗]

using observed values F = f1 . . . fn as described in (13). An efficient way to performthis is to utilize fast Fourier transform. For this, we have now

F(Sg)(0) = (2π)2∑νν

∗F( f )(ν)F( f )(ν)∆ν

= (2π)2∑νiν

∗i

1√(|X |)

FFT ( f )(νi)1√(|X |)

FFT ( f )(νi)∆ν

=(2π)2

|X |2 ∑νiν∗i FFT ( f )(νi)FFT ( f )(νi) (14)

where we have noticed ∆ν = 1|X | .

2.2.2 Accounting for noisy samples

We assume the actual observations are corrupted with noise such that

z(x) = f (x)+w,

42

where w is iid zero mean Gaussian measurement noise with E(w2) = σ2n . As such, for a

gradient, we obtain

F(∇z)(ξ ) = 2iπξ (F( f )(ξ )+F(w)(ξ )),

and for the expected structure we have now

E[∇z(s)(∇z(s))∗] = σ2f π−d/22−d−1|Σ|−1/2

Σ−1 +σ

2n

∫(2π)2

νν∗dν (15)

that shows that the gradient will be dominated by the sensor noise. Under the FFT thesensor noise will accumulate through σ2

n π2

3|X |2 , and if we would know the sensor noisevariance, we might try to remove this component from computed (14). Now, the majorproblem is with the frequency components whose signal-to-noise-ratio is low. Oursolution here is to remove those components from computation. If we do not knowaccurate enough the sensor noise, a better solution is to use only those frequencycomponents that significantly affect to the signal part, and iteratively search the correctkernel. This is explained below

1. Initialize kernel parameters λ1, λ2 and θ , such that, Λ =

[λ1 00 λ2

], Σ =UΛU∗,

and U =

[cos(θ) −sin(θ)sin(θ) cos(θ)

].

2. Compute convolution kernel values for selected frequency components F(k)(vi) =

exp(− 12 v∗i Σξ ).

3. Compute structure tensor using only components with significant signal content, i.e.,restrict computation

E[∇ f (s)(∇ f (s))∗] =(2π)2

|X |2 ∑viv∗iFFT (z)(vi)FFT (z)(vi)

to components whose F(k)(vi)≥ 0.1.4. Compute |Σ|ΣΣ = σ4

f (πn4n+1 E[∇ f (s)(∇ f (s))∗]E[∇ f (s)(∇ f (s))∗])−1

5. Compute eigenvalues d1,d2 and eigenvectors U of |Σ|ΣΣ, to find out λ1,λ2, and toform Σnew.

6. Compute the Frobenius norms ||Σnew−Σ||F and ||Σnew||F , set Σ = Σnew and returnback to phase 2. if ||Σnew−Σ||F > 0.00001 · ||Σnew||F . The stopping criterion shouldensure the adaptation process has converged close to the final value.

43

2.2.3 Working with unknown signal variation

Following from equation (11), we observe that in the kernel adaptation white noisevariation and diffusion kernel are intrinsically connected. One solution is to maximizemarginal likelihood 9 with respect to the white noise variation. Since we assume thatalso sensor noise is unknown, we propose, at each iteration, to maximize logarithmicmarginal measurement likelihood against signal variance and sensor noise parameters. Ifwe now define

|Σ|Σ2= (πn4d+1 E[∇ f (s)(∇ f (s))∗]E[∇ f (s)(∇ f (s))∗])−1, (16)

we have for each hypothesis Σ = σ f Σ. That way, we can replace phases 4 and 5 in theprocedure of Section 2.2.2 by equation (16) and by definition Σnew = σ∗f Σ, where

σ∗f = argmax

σ f

maxσnN (zA;0, σ f

2KA,ΣK∗A,Σ +σ2n IA),

and KA,Σ is a set of diffusion kernels at locations A with a diffusion tensor Σ.

2.2.4 Experiments

In the experiments, we created artificial data using white noise process with σ f = 6,anisotropic kernel parameters λ1 = 152, λ2 = 52, and θ = π/3, and sensor noise withσn = 0.6, presented in Fig 4.

(a) True process (b) Noisy observations

Fig 4. Artificial data used in the first experiment.

In the first experiment, our objective was to demonstrate how well our iterative

44

adaptation performs when we start with diffusion scale parameters that are muchsmaller with respect to the true values, and with isotropic direction. We measured bothconvergence speed and estimation error. Fig 5 presents the predictive mean at eachiteration step. Fig 6 shows Frobenius norm of error between the true and estimateddiffusion tensor, and mean squared error of the predictive mean. We observe adaptationaccuracy is good and adaptation speed is fast, even though diffusion is initially assumedisotropic and much smaller.

(a) Iteration 1 (b) Iteration 2

(c) Iteration 3 (d) Iteration 4

Fig 5. Gaussian process regression under anisotropic Gaussian kernel adaptation.These images present the predictive mean at different iteration rounds. Initialvalues for adaptation are λ1 = 12,λ2 = 12, and θ = 0.

45

(e) Iteration 5 (f) Iteration 10

Fig 5. Gaussian process regression under anisotropic Gaussian kernel adaptation.These images present the predictive mean at different iteration rounds. Initialvalues for adaptation are λ1 = 12,λ2 = 12, and θ = 0 (cont.)

(a) Frobenius norm of diffusion tensor error Sadapt −Strue

(b) Mean squared error of a predictive mean fest

Fig 6. Mean squared error and Frobenius norm at different iteration rounds.

In the second experiment, our objective was to demonstrate how well our iterativeadaptation performs when we start with diffusion scale parameters that are much largerwith respect to the true values, and with isotropic direction. We measured convergencespeed of diffusion kernel, signal variance and signal noise parameters, and estimationerror. Fig 7 presents the predictive mean at each iteration step. Fig 8 presents Frobeniusnorm of error between the true and estimated diffusion tensor, and the mean squarederror of the predictive mean. We observe adaptation accuracy is good and adaptation

46

speed is fast, even though diffusion is initially assumed isotropic and much larger.





47

(a) Frobenius norm of diffusion tensor error S−S (b) Mean squared error of a predictive mean f


In the third experiment, our objective was to demonstrate how well our iterativeadaptation performs when also signal variance and sensor noise are unknown. Wemeasured convergence speed of diffusion kernel, signal variance and signal noiseparameters, and estimation error. Fig 9 presents the predictive mean at each iterationstep. Fig 10 shows Frobenius norm of error between the true and estimated diffusiontensor, and mean squared error of the predictive mean. Fig 11 presents signal varianceand sensor noise estimate at iteration step. We observe that our iterative approach is alsotolerant against unknown signal variation and sensor noise, and adaptation speed isalmost as good as if these parameters where known.



48



Fig 9. Gaussian process regression under anisotropic Gaussian kernel adaptation.These images present the predictive mean at different iteration rounds. Initialvalues for adaptation are λ1 = 12,λ2 = 12, and θ = 0 (cont.)



49

(a) Signal variance estimate σ f (b) Sensor noise estimate σn

Fig 11. Signal variance and sensor noise estimates at different iteration rounds.Blue line denotes the estimate, black the true value.

In the fourth experiment, our objective was to compare how a simple hill climbingalgorithm with a marginal likelihood objective function performs with the same data set.We started with the same initial parameters as in experiments 1 and 3 with known whitenoise variance and sensor noise. We replaced steps 2 - 6 in the adaptation procedurefrom section 2.2.2 with local grid search, where grid sizes were 1 for

√λ1 and

√λ2, and

10degs for θ . Fig 12 presents the predictive mean at each iteration step. Fig 13 showsFrobenius norm of error between the true and estimated diffusion tensor, and meansquared error of the predictive mean. Fig 14 presents diffusion kernel parameters at eachlocal grid search step. We observe, for stationary processes, that simple hill climbingalgorithms result similar performance in adaptation accuracy.

(a) Step 1 (b) Step 3

Fig 12. Gaussian process regression under anisotropic Gaussian kernel gridsearch adaptation.

50

(c) Step 5 (d) Step 7

(e) Step 9 (f) Step 13

Fig 12. Gaussian process regression under anisotropic Gaussian kernel gridsearch adaptation. These images present the predictive mean at different gridsearch steps. Initial values for adaptation are λ1 = 12,λ2 = 12, and θ = 0 (cont.)


Fig 13. Mean squared error and Frobenius norm at different grid search steps.

51

(a) Square root of diffusion scale estimate λ1 (b) Square root of diffusion scale estimate λ2

(c) Diffusion direction estimate θ

Fig 14. Diffusion parameter estimates at different grid search steps. Blue linesdenote the estimate, black lines true values. Notice that λ1 = 52, λ2 = 152, θ =−π/6

result in the same diffusion kernel as λ1 = 152, λ2 = 52, θ = π/3.

These experiments demonstrated that our iterative approach is able to converge closeto the true diffusion parameter values without additional value bounding. What is more,the approach is also theoretically justified. In addition, we observed that combinedwith maximum marginal likelihood signal variance and sensor noise parameter search,our approach is suitable for realistic modelling applications where we cannot infer, inadvance, whether small spatial variation in values is due to heavy diffusion or smallwhite noise variation. Finally, we also observed that, with stationary processes, due tothe unimodality of marginal likelihood w.r.t. the diffusion kernel parameters, simple hillclimbing algorithms could be used instead of our approach. However, as stated in thethesis introduction, our target environments are non-stationary, which therefore requirean approach that can adapt multiple diffusion kernels simultaneously. This is introducedin the next section.

52

2.3 Moving average kernel adaptation

Now, when we move towards location variant diffusion parameters, we observe

∇ f (s) =∫

∇|Σ(s)||Σ(s)|−1.5(2π)−d/2 exp(−0.5(s−u)∗Σ(s)−1(s−u))ω(u)du

+∫−(s−u)Σ(s)−1((2π)d |Σ(s)|)−0.5 exp(−0.5(s−u)∗Σ(s)−1(s−u))ω(u)du

+∫(s−u)∗∇Σ(s)Σ(s)−3(s−u)((2π)d |Σ(s)|)−0.5 exp(−0.5(s−u)∗Σ(s)−1(s−u))ω(u)du

and for the expected value, when d = 1, we have

E[∇ f (s)(∇ f (s))∗] = σ2f π−0.4

Σ(s)−1.5(14+

38

∇Σ(s)).

Next, we consider how to estimate E[∇ f (s)(∇ f (s))∗] with observed data. Using asimilar procedure as with the global diffusion parameters, we find out

E[∇ f (s)(∇ f (s))∗] =1|Y |

∫t∈Y

∫(t− v)(t− v)∗Σ(s)−2

((2π)d |Σ(s)|)−1 exp(−(t− v)∗Σ(s)−1(t− v))

ω(v)ω(v)dvdt,

which is hard to estimate using observed data. Notice that the integral over u doesnot correspond to any reasonable outer product ∇ f (t)(∇ f (t))∗ unless we assume thediffusion parameter is location invariant, i.e., Σ(t) = Σ(s) : ∀t. This naturally meansthat since the local diffusion parameter Σ(s) can be set arbitrarily, we cannot eitherapproximate it based on the other samples y(t). However, since the diffusion parametersshould reflect the signal variation speed, we can make an assumption that the diffusionparameter extent should somehow reflect the diffusion strength, i.e., the strongerdiffusion should locally also change less. We propose two different approaches. The firstis based on structure tensor whose weight function g has has the local diffusion tensorshape, similar to [29], [7]. The second is based on directional derivative computationbetween observations, formulating with these and the directional vector an outer product,and using weighted average based on the current diffusion tensor shape, similar to [8].The former approach is used in this thesis.

Now, the expected gradient outerproduct is approximated by a structure tensor

E[∇ f (s)(∇ f (s))∗] ≈∫((2π)d |Σ(s)|)−0.5 exp(−0.5(s− t)∗Σ(s)−1(s− t))

∇ f (t)(∇ f (t))∗dt

53

2.3.1 Computing structure tensor

To compute gradient outr product ∇ f (t)(∇ f (t))∗, we can utilize either finite differenceor Fourier domain approach. If applying fast Fourier transform, we obtain

F(∇ f ∇ f ∗)(ξ ) =(2π)2

|X |2 ∑νi(ξ −νi)∗FFT ( f )(νi)FFT ( f )(ξ −νi)

2.3.2 Accounting for noisy samples

For the noise compensation, we can apply a similar approach as previously in removingirrelevant parts from the signal in the Fourier domain. Now, however, since there aremultiple different diffusion parameters, we compute the average value and use thisto restrict signal bandwidth. At the same time, we can estimate the noise level fromthe frequency components out of the bandwidth. The iterative algorithm is similar toSection 2.2.2, with a different method in structure tensor computation. The algorithmwithout accounting for ∇|Σ| and ∇Σ is given below.

1. Initialize same kernel parameters λ1, λ2 and θ ,for all Σ(s) =UΛU∗.2. Compute average diffusion value Σ and use this to create average kernel in the Fourier

domain F(k)(vi) = exp(− 12 v∗i Σvi).

3. Filter measurements z to get approximately f in the Fourier domain by usingFFT ( f )(vi) = FFT (z)(vi) when F(k)(vi)≥ 0.1, and FFT ( f )(vi) = 0 otherwise.

4. Create for each s gradient outer product ∇ f (s)(∇ f (s))∗, using either finite differenceor frequency domain methods.

5. Create structure tensor to approximate expected outer product for each s, using

E[∇ f (s)(∇ f (s))∗] ≈ η ∑((2π)d |Σ(s)|)−0.5 exp(−0.5(s− ti)∗Σ(s)−1(s− ti))

∇ f (ti)(∇ f (ti))∗

where η is a normalizing factor6. Compute |Σ(s)|Σ(s)Σ(s) = σ4

f (πn4n+1 E[∇ f (s)(∇ f (s))∗]E[∇ f (s)(∇ f (s))∗])−1

7. For each |Σ(s)|Σ(s)Σ(s) , compute eigenvalues d1,d2 and eigenvectors U to find outλ1,λ2 and to form Σ(s)new.

8. Stabilize each Σ(s)new with Σ(s)new = 0.5(Σ(s)new +Σ(s)).9. For each s, compute the Frobenius norms ||Σ(s)new − Σ(s)||F and ||Σ(s)new)||F ,

set Σ(s) = Σ(s)new and return back to phase 2 if ∑s ||Σ(s)new−Σ(s)||F > 0.00001 ·∑s ||Σ(s)new||F .

54

2.3.3 Experiments

In the first experiment, our objective was to demonstrate how well our approach adaptsto different location dependent diffusion scale parameters. We generated two one-dimensional data sets; in the first one, diffusion scales followed sine wave, whereas, inthe second, rectangular function. Signal variance σ f = 6 was assumed to be known andwe start with initial scale values λ = 1 for all locations. We run tests for both noiselessand noisy observations with sensor noises σn = 0.006 and σn = 0.6, respectively. Wemeasured convergence speed of diffusion kernels, and an estimation error at differentiteration rounds. Fig 15, presents the predictive mean, Fig 16 diffusion scale parametervalues, Fig 17 mean squared error of the predictive mean, and Fig 18 mean diffusionscale parameter estimation error at different iteration rounds for sine wave diffusionvariation. Fig 19 presents the predictive mean, Fig 20 diffusion scale values, Fig 21mean squared error of the predictive mean, and Fig 22 mean diffusion scale estimationerror at different iteration rounds for rectangular function diffusion scale variation.We observe that the adapted scale parameters follow the actual scale parameter trend,however, since structure tensor utilizes diffusion kernels from previous iterations, largelocal values filter towards average local values. This can be observed in Fig 16 atwave peaks points and in Fig 20 at step points. This smoothing approach is, however,necessary so that the adaptation is tolerant towards unknown sensor noise, which can beseen with noisy data in Fig 16 and Fig 16. From Fig 17 and Fig 21, we observe that,with noisy observations, predictive accuracy of the adapted model converges close tothe accuracy of the true model. Interestingly, with noiseless observations, predictiveaccuracy of the adapted model cannot compete with measurement error. This doesnot, however, apply to the cases when predictive locations differ from measurementlocations, as is introduced in the section .

55

(a) Iteration 1. Noiseless data. (b) Iteration 1. Noisy data.

(c) Iteration 3. Noiseless data. (d) Iteration 3. Noisy data.

(e) Iteration 5. Noiseless data. (f) Iteration 5. Noisy data.

Fig 15. Mean function estimation using adapted diffusion kernels. Blue is the truefunction, green is the noisy data, and red is the estimated function.

56




Fig 16. Diffusion scale parameter adaption where true scale parameters followsine wave function. Blue are the true parameters, red are the estimates.

57

(a) Noiseless data. (b) Noisy data.

Fig 17. Mean squared error of a predictive mean f at different iteration rounds.Blue line indicates MSE of the true model, green line MSE of measurements, andred line MSE of the adapted model.


Fig 18. Mean squared diffusion scale estimation error at different iteration rounds.

58





59




Fig 20. Diffusion parameter adaption where true diffusion parameters follow rect-angular function. Blue are the true parameters, red are the estimates.

60




Fig 22. Mean squared diffusion scale estimation error at different iteration rounds.

In the second experiment, our objective was to demonstrate how our approach differsfrom the marginal likelihood maximization approach. We run gradient ascend algorithm500 steps with a step size maxi ∆λi = 1 for the sine wave diffusion scale parameter dataset from the first experiment with initial scale values for all λi, λi = 1. We measuredconvergence accuracy and estimation error at different steps. Fig 23, presents thepredictive mean, Fig 24 diffusion scale parameter values, Fig 25 mean squared errorof the predictive mean, and Fig 26 mean diffusion scale parameter estimation errorat different steps. For noisy data, we observe that gradient ascent, without additional

61

parameter regulations, leads to local optimums where neighbor diffusion scales varyremarkably, which naturally introduces discontinuity to the function estimate. One wayto overcome this would be to tight parameters under some process, as was proposed in[6, Section 11.1]. That would, however, introduce new unknown factors to the system.

62

(a) Step 100. Noiseless data. (b) Step 100. Noisy data.

(c) Step 300. Noiseless data. (d) Step 300. Noisy data.

(e) Step 500. Noiseless data. (f) Step 500. Noisy data.


63

(a) Step 100. Noiseless data. (b) Step 100. Noisy data.

(c) Step 300. Noiseless data. (d) Step 300. Noisy data.

(e) Step 500. Noiseless data. (f) Step 500. Noisy data.

Fig 24. Diffusion scale parameter adaption where true scale parameters followsine wave function. Blue are the true parameters, red are the estimates.

64


Fig 25. Mean squared error of a predictive mean f at different search steps. Blueline indicates MSE of the true model, green line MSE of measurements, and redline MSE of the adapted model.


Fig 26. Mean squared diffusion scale estimation error at search steps.

In the third experiment, our objective was to demonstrate how non-stationaryadaptation differs from stationary adaptation with two-dimensional non-stationaryanisotropic data. We generated data with signal variance σ f = 6, sensor noise σn = 0.6and diffusion parameters λ1 = 5, λ2 = 3−2cos(2πs/N) and θ = 60 presented in Fig 27.For stationary adaptation, we used the approach described in Section 2.2 together withΣ(s)new stabilization.

65

(a) True process (b) Noisy observations

(c) Diffusion tensors

Fig 27. Artificial data used in the third experiment.

We measured convergence speed and estimation accuracy. Fig 28, presents thepredictive mean, Fig 29 diffusion scale parameter values, Fig 30 mean squared errorof the predictive mean, and Fig 31 mean diffusion scale parameter estimation error atdifferent iteration rounds for non-stationary, anisotropic process. We observe, eventhough, in the central area diffusion is obviously large, due to the high variation in thecorners, that stationary adaptation is not able to increase diffusion scale parametersremarkably. In contrast, non-stationary adaptation can adapt locally to data, whichprovides better estimation accuracy for diffusion tensors and smaller prediction error ofa spatial process. We also observe, due to the randomness of a white noise process itself,that it is hard to judge whether some local variation follows from too small diffusiontensor or randomness of the phenomena. This is not observed with the stationary data,where we can utilize more data to filter out randomness in the phenomena. Finally,we observe that diffusion scale 5 is also large compared to the rate of change of that

66

parameter.

(a) Iteration 1. Non-stationary adaptation. (b) Iteration 1. Stationary adaptation.

(c) Iteration 3. Non-stationary adaptation. (d) Iteration 3. Stationary adaptation.

(e) Iteration 5. Non-stationary adaptation. (f) Iteration 5. Stationary adaptation.

Fig 28. Mean function estimation using adapted diffusion kernels.

67

(a) [Iteration 1. Non-stationary adaptation. (b) [Iteration 1. Stationary adaptation.

(c) [Iteration 3. Non-stationary adaptation. (d) [Iteration 3. Stationary adaptation.

(e) [Iteration 5. Non-stationary adaptation. (f) [Iteration 5. Stationary adaptation.

Fig 29. Diffusion tensor adaption for non-stationary anisotropic data.

68

(a) Non-stationary adaptation. (b) Stationary adaptation.


(a) Non-stationary adaptation. (b) Stationary adaptation.

Fig 31. Mean squared diffusion tensor estimation error at each iteration step.

These experiments show our iterative approach for moving average kernel adaptationcan find diffusion scale parameters, such that, the predictive accuracy of the adaptedmodel converges close to the accuracy of the exact model. However, if the diffusionparameters change rapidly, there is no way to find out the correct values. We alsoobserved, without additional process model for the diffusion parameters, that simplegradient based maximum marginal likelihood methods would easily lead to localoptimums where adjacent values differ remarkably. These experiments also show ifthe data is naturally non-stationary, our iterative approach for moving average kernel

69

adaptation can provide better results for noisy data compared to if we simply wouldadapt translation invariant model.

2.4 Kernel adaptation under sparse sampling

In sequential sampling, our goal is to iteratively take new measurements such that wecan minimize the number of samples required to represent the environment at certainaccuracy. In stationary models, the evenly spaced sample locations are, in general, thebest solution. However, with non-stationary processes, we expect that the sample densityshould be higher in highly varying areas.

This kind of optimization problem is NP-hard, and our approach here utilizes simplegreedy sampling, such that, at each iteration, we select the sampling location that isexpected to provide maximal increment for the objective function. For the objectivefunction, we utilize here posterior entropy, such that we try to find a sequence oflocations X such that

h(F |Zk = (z1, . . . ,zk),Xk = (x1, . . . ,xk)) (17)

is minimized. In equation (17) (z1, . . . ,zk) is the sequence of measurements, and,(x1, . . . ,xk) is the corresponding sequence of locations. The sequence order might affectthe iterative adaption, however, we show that our approach based on well modeledadaptation is less affected by the order.

Now, in order to compute gradients at each position, we need to interpolate processat every location, using sparse samples Zn. For that, we can either utilize Gaussianprocess regression with parameters estimates from the previous round, such that

f = E[F ]+Cov(F,Zk|θk−1)(Cov(Zk,Zk|θk−1)+σ2n Ik)

−1((z1, . . . ,zn)−E[Fk]).

or some simpler interpolation (e.g. linear). Especially with 2d data, we observed thatGaussian process regression with adapted parameter estimates may lead to an unstablesystem, such that, large length scale at some direction provides smooth interpolation inthe same direction which again increases the length scale. Also some predefined constantparameter might affect the adaptation, and for that reason, with 2d data, we interpolatednoisy data directly with linear interpolation. The problem with that approach was thatwe were not able to remove noise from the signal.

70

2.4.1 Experiments

In the first experiment, our objective was to motivate sparse sampling approach. Wewanted to confirm that, in order to optimize sensing, sampling should be focused on theareas where diffusion is smallest. We assume both diffusion and noise parameters areknown, and we used iterative approach for sampling. We generated one-dimensionalnon-stationary data with signal variance σ f = 1, sensor noise σn = 0.05, and diffusionscale followed sine function. Fig 32 presents results after 30 samples, where the totalnumber of locations is 101. We observe that samples are mostly concentrated on theareas with high variation.

(a) Blue is the true signal, green samples, and red theestimated signal

(b) Known diffusion parameters

(c) Ordered list of sampling locations

Fig 32. One-dimensional greedy sampling when the diffusion parameters areknown.

71

In the second experiment, our objective was to compare how adaptive samplingdiffers from the previous experiment where the model was known. Especially, wewanted to confirm if it is enough to run diffusion kernel adaptation between phases2 and 8, as explained in Section 2.3.2, only once after each new sample. We startedwith unknown noise and diffusion parameters and searched iteratively for the first 30sampling locations. Fig 33 presents results after 30 samples. We observe that even withhigh sensor noise, we can find the values similar to all our sample configurations inthe previous section. In addition, the sample concentrate mostly to the high varyingareas, although, it is not as obvious as in Fig 32. This is simply because we start withstationary assumption (all diffusion parameters set to 1) and first iterations generallyprovide arbitrary results. In addition, the highest diffusion values are filtered out, whichaffects the results.

(a) Blue is the true signal, green samples, and redthe estimated signal

(b) Blue is the true diffusion parameters, and redafter adaptive estimation.

(c) Ordered list of sampling locations

Fig 33. One-dimensional adaptive greedy sampling when the diffusion parametersand noise are unknown.

72

In the third experiment, our objective was to demonstrate how non-stationaryadaptation works under sequential sampling process with two-dimensional non-stationaryanisotropic data. We generated noisy and noiseless data with signal variance σ f = 1,sensor noises σn = 0.001 and σn = 0.05 for noiseless and noisy data, respectively, anddiffusion parameters λ1 = 3−2cos(2πs/N), λ2 = 5 and θ = 0 presented in Fig 34.

(a) True process (b) Noisy data

(c) Diffusion tensors

Fig 34. Non-stationary anisotropic data used in sequential sampling.

We started with initial parameters λ1 = λ2 = 1 and θ = 0, and run adaptation onceafter each sample for the first 100 samples. We compared results to the sequentialsampling where diffusion parameters were known in advance. We measured convergencespeed and estimation accuracy. Fig 35, presents the predictive mean, Fig 36 diffusionscale parameter values, Fig 37 mean squared error of the predictive mean, and Fig 38mean diffusion scale parameter estimation error at different sampling rounds for non-stationary, anisotropic process. Finally, Fig 39 presents sample locations for the first 100samples for sequential sampling with known and adapted diffusion parameters. We

73

observe, surprisingly, that the sequential sampling with model adaptation attains resultsthat are comparable to the results from sampling with known diffusion parameters. Alsothe sample locations distribute in similar manner.

(a) After 10th sample. Noiseless data. (b) After 10th sample. Noisy data.

(c) After 30th sample.Noiseless data. (d) After 30th sample. Noisy data.

(e) After 80th sample. Noiseless data. (f) After 80th sample. Noisy data.

Fig 35. Mean function estimation using adapted diffusion kernels after differentsampling rounds.

74




Fig 36. Diffusion tensor adaption for non-stationary anisotropic data after differentsampling rounds.

75

(a) Noiseless data (b) Noisy data

Fig 37. Mean squared error of a predictive mean f after different sampling rounds.Blue line indicates MSE for the sequential sampling where the diffusion are known,and red line MSE for the sequential sampling with model adaptation. Green indi-cates mean squared measurement error.


Fig 38. Mean squared diffusion tensor estimation error after different samplingrounds.

76

(a) Sample locations with known diffusion parameters.Noiseless data.

(b) Sample locations with known diffusion parameters.Noisy data.

(c) Sample locations with adapted diffusion parameters.Noiseless data.

(d) Sample locations with adapted diffusion parameters.Noisy data.

Fig 39. Sample locations selected by greedy sampling methods when diffusionparameters are known and when they are adapted based on sampled data.

In the fourth experiment, our objective was to demonstrate how non-stationaryadaptation works with real data. We used magnetic field x-component from a smalllobby, shown in Fig 1. We run experiments for noiseless and noisy data, with noiseσn = 1.5µT , presented in Fig 40.

77

(a) True process. (b) Noisy data

Fig 40. Magnetic field x component in a small lobby. Each grid presents 0.25m x0.25m, so that, the total survey area is 3.5m x 4.5m.

We started with initial parameters λ1 = λ2 = 1 and θ = 0, and ran adaptation onceafter each sample for the first 100 samples. At each round, we also estimated mostlikely signal variation, as explained in Section 2.2.3, that was used to compute diffusiontensor. We measured convergence speed and estimation accuracy. Fig 41, presents thepredictive mean, Fig 42 diffusion scale parameter values, Fig 44 signal variation andFig 43 mean squared error of the predictive mean at different sampling rounds for bothnoiseless and noisy data. Finally, Fig 45 presents sample locations for the first 100samples for sequential sampling with known and adapted diffusion parameters. Weobserve that adaptation finds stable diffusion tensors and signal variance parametersfor both noiseless and noisy data. However, for noisy data, high local variations areexplained by higher signal variance rather than by small local diffusion. Also the samplelocations distribute in a reasonable manner, such that, less samples are used for flatareas.

78




Fig 41. Mean function estimation using adapted diffusion kernels after differentsampling rounds.

79




Fig 42. Diffusion tensor adaption for magnetic field data after different samplingrounds.

80


Fig 43. Mean squared error of a predictive mean f after different sampling rounds.Red line indicates MSE for the sequential sampling and green mean squared mea-surement error.


Fig 44. Signal variance estimate after different sampling rounds.

81

(a) Sample locations for noiseless data (b) Sample locations for noisy data

Fig 45. Sample locations selected by greedy sampling methods for noiseless after100 samples.

These adaptive sampling experiments show that our diffusion kernel adaptationsuits well also for sparse sampling, that is the main target in our research. Since theinformation based sensing criterion favors samples that are from the highly varyingareas, it is essential that our diffusion adaptation can direct the sampling to the correctdirection. In the next chapter, we describe mutual information sensing criteria that aremore general than the entropy of posterior covariance used here.

82

3 Quantifying sensing quality

The objective in this chapter is to develop and justify a sensing quality that can serve asa general measure for prediction quality of a model. As was stated in Section 1.3.2, witha proper sensing quality we can compare the quality of different designs of sensinglocations. In addition, for action planning, the sensing quality should account for theuncertainty that each sequence of actions possesses for the sensing locations. In thefollowing subsections, we go through related work, followed by the mutual informationbased sensing quality derivation and show how this also accounts for the localizationuncertainties.

3.1 Related work

In [31] Guestrin et al. proposed a sensing quality based on mutual information betweenobservations at selected and not selected locations, i.e, h(ZS)−h(ZS|ZV\S), where wenotice that this criterion does not account for the actual phenomenon F . This criterioncan be motivated by sensor placement problems, were we can assume sensing at certainlocations provides all information related to the phenomenon at those locations. Such anapproach is reasonable, if we assume the sensor noise is negligible or we can provideso many samples that individual noise parts can be averaged out. However, for therobotic sensing applications where we need to take multiple measurements at samelocations, this does not fit well. In our studies [11], we proposed a sensing quality basedon the assumption that we can take multiple measurements at the same locations. This isgiven by mutual information between current and all possible future observations, i.e.,h(Z(s1,...,si))−h(Z(s1,...,si)|Z(si+1,...,sn)), where all observations are drawn from locationss ∈V . This can be observed as a temporal extension to the Guestrin et al. proposedmetric. In addition, we can show the relation

limn→inf

h(Z(s1,...,si))−h(Z(s1,...,si)|Z(si+1,...,sn)) = h(Z(s1,...,si))−h(Z(s1,...,si)|F), (18)

where we assumed the sensor noise is iid, future sensing locations are distributeduniformly, and F is defined on locations V . The equation (18) simply means that expectedinformation gain knowing all forthcoming measurements is the same as knowing themodel. In presentational issues, this can be simplified back to h(Z)−h(Z|F), however,if we do relaxations so that Z is not necessarily defined on V , we lose (18) relation.

83

In robotic SLAM area, Stachniss et al. proposed in [26] an expected informationgain based sensing criterion for the SLAM problem. They utilized mutual informationbetween the joint distribution of a map and locations, and observations, i.e. h(X ,F)−h(X ,F |Z). We can decompose this approach by h(X ,F)−h(X ,F |Z) = h(F)−h(F |Z)+h(X |F)− h(X |F,Z), where h(F)− h(F |Z) defines the model quality and H(X |F)−H(X |F,Z) localization accuracy. In online SLAM context, to maintain operationalcapability it is well arguable to use these both metrics, just to guarantee that at everytime our best estimate is not too far away from the true location. However, if we areinterested only in the modelling (or mapping) quality it is more reasonable to useonly the h(F)−h(F |Z), that is based on marginalization over sensing locations, i.e.,p( f ) =

∫p( f |x)p(x)dx and p( f |z) =

∫p( f |x,z)p(x|z)dx. Also if we have designs A and

B with same marginal model quality h(F)−h(F |ZA) = h(F)−h(F |ZB), maximizingargmax(H(XA|F)−H(XA|F,ZA),H(XB|F)−H(XB|F,ZB)) may lead to a design thathas larger prior and posterior localization uncertainty. Even though h(F)− h(F |Z)considers only model uncertainty, it can be shown to also favor previously visited areas.And eventually, this metric can be used as a real exploration criterion, such that theexploration is continued until a certain level of model quality h(F)−h(F |Z) is attained.

3.2 Mutual information criterion

We start with the definition of mutual information between the selected and all possiblefuture measurements. This can be realized as an expected information gain sensing atcertain locations would provide for the following measurements, given by

h(Z(s1,...,si))−h(Z(s1,...,si)|Z(si+1,...,sn)) = h(Z(si+1,...,sn))−h(Z(si+1,...,n)|Z(s1,...,si)). (19)

Next, we will prove that under certain conditions, (19) can be given by h(Z)−h(Z|F).

Proposition 3.2.1. Let us denote Z := Z(s1,...,si), and for Z(si+1,...,sn), we require all

si+1, . . . ,sn ∈ V. Next, let’s state each observation z(s) = f (s)+w, where w is i.i.d.

Gaussian sensor noise, and f (s) is a Gaussian process at location s. Let us denote

F := F(s) : s ∈V. If the Z(si+1,...,sn) are evenly distributed among locations s ∈V , we

have

limn→∞

h(Z(s1,...,si))−h(Z(s1,...,si)|Z(si+1,...,sn)) = h(Z)−h(Z|F). (20)

Proof. Assuming the Z := Z(s1,...,si), we now only need to prove

limn→∞

h(Z(s1,...,si)|Z(si+1,...,sn)) = h(Z|F) (21)

84

To simplify, let us first consider a conditional density h(Z1|Z2, . . . ,Zn) where all mea-surements are at a certain location s. Now, we have

h(Z1|Z2, . . . ,Zn) =∫· · ·∫

h(Z1|Z2 = z2, . . . ,Zn = zn)p(z2, . . . ,zn)dz2 · · ·dzn,

and from the definition Z = F +W , we find out

h(Z1|Z2, . . . ,Zn) =∫· · ·∫

h(Z1|Z2 = z2, . . . ,Zn = zn)

·∫

p(F = f )p(W2 = z2− f , . . . ,Wn = zn− f )d f dz2 · · ·dzn.

This simply means that the observations can be explained by different combinations ofsensor noise and the spatial function. Now, assuming certain values f and w2, . . . ,wn,we have

h(Z1|Z2, . . . ,Zn) =∫ ∫

· · ·∫

h(Z1|Z2 = z2, . . . ,Zn = zn)

· p(F = f )p(W2 = z2− f , . . . ,Wn = zn− f )dz2 · · ·dznd f ,(22)

and for the conditional density p(Z1 = z1|Z2 = z2, . . . ,Zn = zn), we have

p(Z1 = z1|Z2 = z2, . . . ,Zn = zn) =∫

p(F = ξ |Z2 = z2, . . . ,Zn = zn)

· p(W = z1−ξ |Z2 = z2, . . . ,Zn = zn)dξ

=∫

p(F = ξ |Z2 = z2, . . . ,Zn = zn)p(W = z1−ξ )dξ ,

and from the consistency of normal posterior distribution, we have

limn→∞

p(Z1 = z1|Z2 = z2, . . . ,Zn = zn)

=∫

δ (ξ − f )p(W = z1−ξ )dξ

= p(W1 = z1− f )

= p(Z1 = z1|F = f ). (23)

Substituting (23) for (22), we have

limn→∞

h(Z1|Z2, . . . ,Zn) =∫

h(Z1|F = f )

· p(F = f )∫· · ·∫

p(W2 = z2− f , . . . ,Wn = zn− f )dz2 · · ·dznd f

=∫

h(Z1|F = f )p(F = f )d f (24)

Finally, assuming that our future observations cover uniformly all sampling locations,this result generalizes to (21).

85

Next, given this sensing quality in a form h(Z)−h(Z|F) = h(F)−h(F |Z), we canprovide some conditions when this sensing quality has different properties. First, wenotice that this is non-decreasing, which naturally arises from the relationship betweenconditional and marginal entropy. Next, we will prove that under certain conditions thissensing quality is also submodular. The benefit for being non-decreasing submodularfunction is that, in general, it makes optimization easier, so that, with computationallyfeasible algorithms one is able to find efficient solutions, see e.g. [15], [14], [31], [21].

Proposition 3.2.2. Let us use a similar assumption, like in Proposition 3.2.1, except that,

now we require also for Z(s1,...,si), s1, . . . ,sn ∈V . Now the sensing quality h(F)−h(F |Z),following from the proposition 3.2.1, is a submodular function.

Proof. In order to prove the proposition, we will first transform the sequence of sensinglocations to the multiset by observing that the order does not matter. This followsdirectly from

Z(s, t) = F(s)+W,

where s denotes location and t time. This also means

h(Z(s, t1)|Z(s, t2)) = h(Z(s, t2)|Z(s, t1))

andh(Z(s, t1))−h(Z(s, t1)|Z(s, t2))> 0.

Now, given two multisets A⊆ B, we denote random variables at those by YA and YB. Ifwe denote the sensing qualities by q(C) = h(ZC)−h(ZC|F), for the increments, we have

q(s∪C)−q(C) = h(Zs,ZC)−h(Zs,ZC|F)− (h(ZC)−h(ZC|F))

= h(Zs|ZC)−h(Zs|ZC,F)

= h(Zs|ZC)−h(Zs|F), (25)

where the last line (25) follows from the conditional independence, i.e., if we knowspatial process at some locations, given some measurements from the same locationsdoes not provide any additional information. This also shows that multiset C elementsmust be restricted to V . Next, we observe

h(Zs|ZA)−h(Zs|F)≥ h(Zs|ZB)−h(Zs|F)

⇔ q(s∪A)−q(A)≥ q(s∪B)−q(B),

which is the definition of submodularity.

86

In order to compare how different sensing qualities are suited for modelling purposes,we can ask how well the obtained sensing locations helps in modelling the truephenomenon behind observations. This can be given by the expected density

EFtrue [p(Ftrue|A)] =∫

p( ftrue|A)p( ftrue)d ftrue,

wherep( ftrue|A) =

∫p( ftrue|zA)p(zA)dzA

is the density marginalized over measurements zA drawn from the true phenomenonftrue. In the following, we will prove that the best result is when optimizing w.r.t. theh(F)−h(F |ZA) sensing quality.

Proposition 3.2.3. argmaxA E ftrue [p(Ftrue|A)] = argmaxA h(F)−h(F |ZA)

Proof. Let us first state, given any observations zA, our model provides posteriordistribution p( f |zA). Next, if we assume these observations are drawn from thephenomenon ftrue, we have

p(zA) = p(zA| ftrue) =N (zA;XA ftrue,σ2n IA),

where XA is the design matrix. As such, we have

p( f |A) =∫

p( f |zA)p(zA| ftrue)dyA,

where p( f |zA) is density in posterior distribution, and p(zA| ftrue) is the density ofmeasurements known to origin from ftrue. For this distribution, we have

p( f |A) =N ( f ;m,c),

wherem =C(V,A)(C(A,A)+σ

2n IA)

−1XA ftrue,

andc =C(V,V )−C(V,A)(C(A,A)+σ

2n IA)

−2C(A,A)C(A,V ),

where C(i, j) is the covariance matrix between elements in locations i and j, and we haveassumed zero a prior mean process, i.e., E[F ] = 0. Next, integrating over all possibletrue realization of the process, we have∫

p( ftrue|A)p( ftrue)d ftrue = (2π)−|V |2 |d|−

12 (26)

87

whered = 2

(C(V,V )−C(V,A)(C(A,A)+σ

2n IA)

−1C(A,V )).

We observe that expected density (26) attains the highest value when the determinant ofthe posterior covariance is smallest, which is obtained when h(F |ZA) is minimal.

Finally, we consider the algebra over mutual information sensing quality. For twovariables YA and YB, and their joint distribution, we have

q(A∪B) = h(ZA,ZB)−h(ZA,ZB|F)

= h(ZA|ZB)−h(ZA, |F,ZB)+h(ZB)−h(ZB|F)

=: q(A|B)+q(B).

Next, we also observe that if ZA and ZC are conditionally independent given ZB, for thejoint distribution we have

p(zA,zB,zC) = p(zA|zB)p(zC|zB)p(zB),

and for the sensing quality, we have

q(A∪B∪C) = q(A|B)+q(C|B)+q(B).

3.2.1 Experiments

In the experiments, our objective was to compare how h(F)−h(F |ZA) performs withrespect to two other commonly used sensing qualities; mutual information betweenselected and not selected measurements h(ZV\A)−h(ZV\A|ZA) and entropy over selectedmeasurements h(ZA), where we observe maximizing with respect to h(ZA) is the sameas minimizing with respect to the conditional entropy h(ZV\A|ZA). Our candidate setcontained seventeen locations and a feasible design five locations. In the experiments,we assumed one-dimensional stationary Gaussian process that follows from diffusion ofwhite noise with σ f = 1, and we measured predictive model performances for optimaldesigns with different sensor noise and diffusion scale parameters. Fig 46 presents rootmean squared errors of predictive distributions for each sensing quality, given by

RMSE( f ) =√

∑s∈V

E f (s)

[( f (s)− ftrue(s))2

], (27)

where f denotes the predictive distribution F |ZA = zA. For the clarity Fig 47 presentslogarithmic values of the root mean squared errors. We observe that, in contrast

88

to expectations, h(ZV\A)− h(ZV\A|ZA) provides in most cases best results in termsof RMSE. h(ZV\A)− h(ZV\A|ZA) favors sampling locations in the center area, as isvisualized in Figs 49-51, such that, the least dependent locations, typically far awayfrom each other, benefit mostly from observations. This was explained in [31], asnot wasting information when the samples are from the center area. However, forpredictive distribution, looking errors independently at each location does not makesense. Fig 48 presents logarithmic densities of the true function at predictive distribution.From the densities we observe, that the optimal set with sensing quality h(F)−h(F |ZA)

provides best results, as expected from Proposition 3.2.3. This can be explained by theposterior covariance: maximum of h(F)−h(F |ZA) sensing quality, denoted here byA∗, provides the smallest determinant of posterior covariance cov(F,F |ZA∗). When themaximum of h(ZV\A)−h(ZV\A|ZA) sensing quality, denoted here by A+ provides betterRMSE, it is required that the trace of the posterior covariance cov(F,F |ZA+) is smallercompared to the trace of cov(F,F |ZA∗). We notice that RMSE values were smaller withh(F)−h(F |ZA) when the extrapolation was much harder compared to the interpolation,as can be seen in Fig 50. This means that sampling corners are less dependent on thesurrounding values. This also resulted in much larger difference between densities.

(a) Sensor noise is 0.001 (b) Sensor noise is 0.1

Fig 46. Root mean squared errors, RMSE( f ), averaged using 100000 sample vec-tors zA drawn from different true functions ftrue. X-axis presents different diffusionscale values.

89


Fig 47. Logarithmic root mean squared errors, log(RMSE( f )), averaged using100000 sample vectors zA drawn from true functions ftrue. Values for differentsensing qualities are scaled, so that, for h(F)− h(F |ZA) the value is always zero.X-axis presents different diffusion scale values.


Fig 48. Logarithmic densities, log(p( ftrue|zA)), averaged using 100000 sample vec-tors zA drawn from true functions ftrue. Values for different sensing qualities arescaled, so that, for h(F)−h(F |ZA) the value is always zero. X-axis presents differentdiffusion scale values.

90

(a) q(A) = h(F)−h(F |ZA) and σn = 0.001. (b) q(A) = h(F)−h(F |ZA) and σn = 0.1.

(c) q(A) = h(ZV\A)−h(ZV\A|ZA) and σn = 0.001. (d) q(A) = h(ZV\A)−h(ZV\A|ZA) and σn = 0.1.

(e) q(A) = h(ZA) and σn = 0.001. (f) q(A) = h(ZA) and σn = 0.1.

Fig 49. True signal, predictive mean and selected observations with the optimaldesigns that maximize different sensing quality functions. Diffusion length-scaleparameter was 1. X-axis presents locations.

91




Fig 50. True signal, predictive mean and selected observations with the optimaldesigns that maximize different sensing quality functions. Diffusion length-scaleparameter was 2.5. X-axis presents locations.

92




Fig 51. True signal, predictive mean and selected observations with the optimaldesigns that maximize different sensing quality functions. Diffusion length-scaleparameter was 5. X-axis presents locations.

These experiments showed that when considering predictive distribution quality,

93

optimizing w.r.t. the h(F)−h(F |ZA) is the best solution. This can be also observeddirectly by noticing that minimizing h(F |ZA) simply means the lowest uncertaintyrelated to the predictions.

3.3 Accounting for localization uncertainties

We start with the definition of sensing quality h(F)− h(F |ZA) where A denotes thedesign of sensing locations, as in the previous section. However, if we cannot statethat this design can be guaranteed, we define by xA some actual realized set of sensinglocations. Now, the sensing quality can be defined through marginal densities p( f ) =∫

p( f |xA)p(xA)dxA and p( f |zA) =∫

p( f |xA,zA)p(xA|zA)dxA. Notice that if the designwould not contain any uncertainties, for the posterior sensing distribution, we wouldhave p(xA|zA) = δ (A− xA). In SLAM exploration, see e.g. [26, 27], the objective is tofind a sequence of actions that minimize both modelling and localization uncertainties.Using these marginal densities, this can be decomposed by

h(XA,F)−h(XA,F |ZA) = h(XA|F)−h(XA|F,ZA)+h(F)−h(F |ZA), (28)

where h(XA|F)−h(XA|F,ZA) and h(F)−h(F |ZA) are the expected information gains forlocalization and modeling, respectively. Even though the localization part is negligiblefor modelling, when optimized w.r.t. h(F)−h(F |ZA), one also implicitly minimizeslocalization uncertainty, where necessary. In the following we prove this and give alsooptimization bounds.

Lemma 3.3.1. Assuming h(XA|F) = h(XA) and h(F |XA) = h(F), we have h(F)−h(F |ZA) = h(F)−h(F |XA,ZA)−h(XA|ZA)+h(XA|F,ZA).

Proof. From the expected information gain h(XA,F)−h(XA,F |ZA), we have equality

h(XA|F)−h(XA|F,ZA)+h(F)−h(F |ZA) = h(XA)−h(XA|ZA)+h(F |XA)−h(F |XA,ZA),

and using assumptions, we find out

h(F)−h(F |ZA) = h(F)−h(F |XA,ZA)−h(XA|ZA)+h(XA|F,ZA).

In lemma 3.3.1, the assumptions simply mean that without observing the environment,we cannot decrease either localization or modelling uncertainty.

94

Corollary 3.3.2. The optimal solution for sensing quality h(F)− h(F |ZA) balances

between expected sensing quality and well predictable environment.

Proof. The proof can be simply attained by observing that h(F)− h(F |XA,ZA) isthe expected sensing quality and for a well predictable area, marginal localizationuncertainty XA|ZA does not benefit a lot from knowing the F , i.e., h(XA|ZA)−h(XA|F,ZA)

is small.

Corollary 3.3.2 means, in addition to exploring new information, optimal designof h(F)−h(F |ZA) favors visiting areas that are previously visited or otherwise welldefined.

Proposition 3.3.3. With discrete sensing path distributions, the sensing quality is

tightly bounded between h(F)−h(F |XA,ZA)≥ h(F)−h(F |ZA)≥ h(F)−h(F |XA,ZA)−H(XA|ZA).

Proof. First, from the law of conditional entropy, we have h(F |XA,ZA)≤ h(F |ZA), whichleads to h(F)− h(F |XA,ZA) ≥ h(F)− h(F |ZA). Next, for the discrete distributions,we have H(XA|F,ZA) ≥ 0, which gives us h(F)− h(F |ZA) ≥ h(F)− h(F |XA,ZA)−H(XA|ZA). Now, in order to prove that the bounds are tight, we need to show that insome conditions, the sensing quality attains these values.

We first assume our environment is highly informative, such that, given any mea-surement z j at location x j from some process f , we would have for the posteriorprobability P(xi| f ,z j) = δi j, where δi j is the Kronecker delta function. As such, weobserve H(XA|F,ZA) = 0, which gives us the tight lower bound.

Next, let’s consider totally uninformative environments, such that for any measure-ment z, and for any xi,x j, we have P( f |xi,z) = P( f |x j,z). As such, we find out

P(xi| f ,z) =p( f |xi,z)P(xi|z)

∑ j p( f |x j,z)P(x j| f ,z)(29)

=p( f |xi,z)p( f |xi,z)

P(xi|z)∑ j P(x j| f ,z)

= P(xi|z),

which gives us H(XA|F,ZA) = H(XA|ZA). As such, we have h(F)−h(F |ZA) = h(F)−h(F |XA,ZA), which is the tight upper bound.

The proposition 3.3.3 means that among designs A,B of approximately similarexpected mutual information h(F)−h(F |XA,ZA)≈ h(F)−h(F |XB,ZB) (expectation

95

over path distribution), in informative environments, we try to minimize posteriormarginal localization uncertainty, minS∈A,BH(XS|ZS). We also observe that exploringnon-informative (flat) areas we don’t have to worry about sensing path distribution. Incontrast, exploring informative areas, it is beneficial to have also small a prior sensingpath uncertainty H(XS), since H(XS|ZS)≤ H(XS).

3.3.1 Sensing path distributions

Let’s state for each design A, we have a prior path distribution with density p(xA). Theposterior path density is now given by

p(xA|zA) =p(zA|xA)p(xA)

p(zA),

with a marginal measurement density

p(zA) =∫

p(zA|xA)p(xA)dxA

and a marginal likelihood (marginalized over f)

p(zA|xA) =N(zA;E[F(xA)],cov(F(xA),F(xA))+σ

2n I),

where F(xA) is a Gaussian process marginalized on locations xA.Now, the problem in computing sensing quality h(F)−h(F |ZA) directly arises from

the approximations to the posterior density

p( f |zA) =∫

p( f |xA,zA)p(xA|zA)dxA,

which would require either Monte Carlo methods and careful sample density scaling toattain the integral

h(F |ZA = zA) =−∫

p( f |zA) log(p( f |zA))

or some continuous approximations to the model, such as, [32]. However, instead ofdirectly attacking the sensing quality, we can now utilize Lemma 3.3.1, such that, wecan integrate over h(F |XA = xA,ZA = zA), where p( f |xA,zA) is the posterior Gaussianmodel density. For the lemma, we also need the posterior path distribution

p(xA| f ,zA) =p(zA| f ,xA)p(xA)

p(zA| f )

96

with a marginal measurement density

p(zA| f ) =∫

p(zA| f ,xA)p(xA)dxA,

and a measurement likelihood

p(zA| f ,xA) =N (zA; f (xA),σ2n I).

3.3.2 Experiments

In the experiments, our objective was to compare how h(F)− h(F |ZA) performswith respect to the sensing quality h(F,X)−h(F,X |ZA), typically, utilized in SLAMexploration. All experiments were conducted such that V = 1, . . . ,7 and the designsize |A|= 3, such that xA is a 3-dimensional vector. This configuration enabled to drawenough sample vectors f and z to approximate H(X |Z) and H(X |F,Z) entropies.

In our motion model, we assumed the initial location was 1 and we are able to takeat each iteration one of the actions ai ∈ −6,−3,0,3,6. The motion uncertainty wasmodelled by

p(xi+1|xi,ai) = η(xi,ai)N (xi+1;xi +ai,0.25|ai|), (30)

where η(xi,ai) is a normalization constant such that ∑i+1 p(xi+1|xi,ai) = 1. This modelimplicitly constraints actions such that if they will lead a lot over the region, theconditional probability will concentrate on the border locations.

In the first experiment, we considered action selection for different sensing qualitiesin a smooth environment, demonstrated in Fig 53, with diffusion length-scale 2, whitenoise deviation σ f = 1 and sensor noise deviation σn = 0.001. Fig 52 shows sensingqualities and path entropies for different action designs. First, we observe that theexpected sensing quality h(F)−h(F |XA,ZA) is slightly higher for the actions (3,6) and(6,6), compared to the (6,−3), due to a smaller a prior path uncertainty. Interestingly,for h(F,XA)− h(F,XA|ZA), actions (3,3), (3,6), and (6,−3) are closely similar inquality, where as, for h(F)−h(F |ZA), action (3,6) is clearly the best selection. Thisdifference can be explained, by looking at H(XA), H(XA|ZA), and H(XA|F,ZA) valuesat those actions. When path entropies H(XA|F,ZA) are small, i.e., the environment isinformative for localization, sensing quality h(F)−h(F |ZA) favors actions that havesmall entropy over marginal posterior entropies H(X |ZA). In similar conditions, jointquality h(F,XA)−h(F,XA|ZA), in contrast, favors actions whose a prior path entropyH(XA) is large.

97

(a) Sensing quality metrics for different actions.

(b) Path entropies for different actions.

Fig 52. Sensing qualities and path entropies for different actions, and an examplefunction with diffusion scale 2 and measurement noise deviation 0.001.

98

Fig 53. Example function realization and noisy samples from that function whendiffusion scale was 2 and measurement noise deviation 0.001.

In the second experiment, we considered how action selection would differ, whenthe noise makes the environment less informative for localization. Diffusion length-scalewas again 2, however, sensor noise was increased to σn = 0.1, as demonstrated in Fig 54.Figure 55 shows sensing qualities and path entropies for different action designs. Weobserve that for the sensing quality h(F)−h(F |ZA), action (3,6) provides again thebest results; however, for the joint quality h(F,XA)−h(F,XA|ZA), action (6,−3) is notany more favorable due to the smaller difference between H(XA) and H(XA|F,ZA).

Fig 54. Example function realization and noisy samples from that function whendiffusion scale was 2 and measurement noise deviation 0.1.

99



Fig 55. Sensing qualities and path entropies for different actions, and an examplefunction with diffusion scale 2 and measurement noise deviation 0.1.

100

In the third experiment, we considered how action selection would perform when theenvironment contains a lot of identifiable variations with small sensor noise. Diffusionlength-scale was now 0.5 with sensor noise σn = 0.001, as demonstrated in Fig 56.Figure 57 shows sensing qualities and path entropies for different action designs.Compared to the previous experiments, we observe that the path entropy H(XA|ZA)

benefits from returning back to the middle area with the action (6,−3). This simplymeans that more variation provide observations from the center area that fit less well tothe corner areas, and respectively, observations from the corner areas get less weightin center area due to the large marginal measurement distribution p(zA). For the jointentropy h(F,XA)− h(F,XA|ZA), action (6,−3) is the best option by a small margin,since the increment H(XA)−H(XA|F,ZA) is now large due to a small posterior entropyH(XA|F,ZA).

Fig 56. Example function realization and noisy samples from that function whendiffusion scale was 0.5 and measurement noise deviation 0.001.

101



Fig 57. Sensing qualities and path entropies for different actions, and an examplefunction with diffusion scale 0.5 and measurement noise deviation 0.001.

102

In the fourth experiment, we considered how action selection would change in anenvironment with lot of variations when the sensor noise increases. Diffusion length-scale was again 0.5, however, sensor noise was increased to σn = 0.1, as demonstratedin Fig 58. Fig 59 shows sensing qualities and path entropies for different designs.We observe that for the joint quality h(F,XA)− h(F,XA|ZA), action (6,−3) is againfavorable, even though for the sensing quality h(F)−h(F |ZA), this is clearly not the bestoption. Compared to Fig 55, we observe that at action (6,−3), the biggest difference iswith the posterior entropy H(XA|F,ZA) that has now less disadvantage from the actionswith a large a prior path uncertainty.

Fig 58. Example function realization and noisy samples from that function whendiffusion scale was 0.5 and measurement noise deviation 0.1.

103



Fig 59. Sensing qualities and path entropies for different actions, and an examplefunction with diffusion scale 0.5 and measurement noise deviation 0.1.

104

In these experiments, we showed that h(F)− h(F |ZA) obtains similar favorablebehavior compared to the joint quality h(F,XA)− h(F,XA|ZA) in reducing path un-certainties with the proper action selection. However, compared to the joint qualityh(F,XA)−h(F,XA|ZA), the sensing quality h(F)−h(F |ZA) is more robust to the a priorpath distributions. This is favorable, since, our target is to provide the most informativepath with low H(X |ZA) entropy, and in these experiments, we observed that the proposedsensing quality fulfills these requirements. The sensing quality h(F)−h(F |ZA) is alsothe best option for modelling, as was stated in Proposition 3.2.3.

105

4 Action planning

In action planning, our objective is to find a sequence of actions that provide in minimaltime a certain sensing quality. Given an initial location x1, we can define a map from asequence of actions (u1, . . .) to some path (x1, . . .)

xi+1 = m(xi,ui)

where m(x,u) is a motion model. We can define an isomorphism between a sequenceof actions and the designed path by xA ∼= uA, where uA = argminu t(u) : u→ xA, andt(u) maps sequence of actions to the travel time. This means that for each path xA wehave a corresponding sequence of actions with minimal travel time. Having also theisomorphism (7), we notice that informative action planning can be described in designspace A ∈ A. As such, our action planning problem is now stated as

minA

T (A) (31)

subject to q(A)≥CI ,

where T maps the design to travel time. Following from Sections 3.2 and 3.3, we canuse the sensing quality q(A) = H(F)−H(F |ZA) for both traditional action planningand action planning under sensing path uncertainties. Now, how do we map design A

to the control actions uA, when XA is a continuous sensing path distribution? This isstraight forward if we assume Xi+1 = M(Xi,ui) where M(x,u) is a stochastic motionmodel. This simply means the sensing path and design uncertainties are due to theuncertainties in the motion model, and e.g. E[XA]→ uA can be used in action planning.

To provide computationally efficient solutions for the action planning problem (31)in spatially correlating environments, our approach is to map sensing quality fromsub-paths into mutually independent action planning points. This enables to utilizegraph approximation methods in path planning. To make this mapping sound, werequire that sensing quality must be preserved among independent set of sub-paths. Todemonstrate our approach, we have utilized here quota TSP approach, where sensingquality is expressed through vertex prices.

In the following subsections, we present related works, define design spaces forthe action planning, propose our vertex price generation algorithm followed by theinformative path planning algorithm with experiments. Finally, we show how path

107

planning under sensing path uncertainties can be realized with graph approximationmethods.

4.1 Related work

The challenge with path planning in the environmental modelling context is that sensinglocations or sensing paths are, in most cases, mutually dependent.

Informative path planning development has been active in recent years for orienteer-ing problems. Chekuri and Pal presented an algorithm [21] that provides 1+ log(k)approximation guarantee when the sensing quality function is submodular. In [16],Singh et al. presented an approach that creates mutually approximately independentsensing point clusters over which one can utilize modular orienteering algorithm [20]with 2+δ approximation guarantee. Binney at al. [22] proposed branch and boundsearch algorithm for the orienteering problem with a monotonicity quality functionrequirement. The monotonicity is required to assure the upper bound for some set ofsensing locations is also an upper bound for any subset. Hollinger and Sukhatme [23]proposed different sampling based rapidly exploring search algorithms that can beproved to approach asymptotically to the optimal solution. In order to restrict searchspace, their approach utilized similar upper bounds to prune infeasible solutions and tospeed up search procedure. For these methods, Suh et al. [24] proposed cross-entropyplanner based modification to provide tighter upper bound and faster convergence.Unfortunately, any approximately optimal orienteering solution can provide arbitrarilybad results for quota stroll, tour or TSP problems. This can be observed by first assumingthe orienteering solution has path p with length lp, such that 0 < lp ≤Cl where Cl is thepath length constraint, and sensing quality q(p), such that CI ≤ q(p)< q∗ where q∗ isthe sensing quality of the optimal orienteering solution and CI the information constraintfor quota problems. This allows a path q with length lq = 0 and information Iq ≥CI toexist.

For quota problems’ search spaces, if one can visit the same places multiple times,the only pruning that can be utilized is to remove the search direction if the shortest pathto the target is longer than the current best solution. If no multiple visits are allowed,one can also prune search direction if there is not enough information available in theconnected space of not visited locations between the search direction and the targetlocation. In [3], we presented a submodular quota tour approach where we utilizedthe former assumption. In addition, to decrease search space, we partitioned all s− t

108

paths to classes with closely similar information content, and proved that with such anapproach, the search algorithm can attain approximation guarantee of 2. We observed,however, that the approach was suitable for only relatively small state spaces. Here,we propose to make action planning easier by removing mutual dependence betweenaction selections. Our approach differs from [33] and [16] in that we do not removesensing points to assure independence. Instead, we distribute information among actionplanning points, such that these points preserve information in mutually independentareas. This naturally means that some of these points get more information than others,which supports the intuition that among spatially correlating variables, sparse samplingis sufficient. This approach allows us to use efficient approximate algorithms, such as[19], [34], and could possibly also allow sampling based path planning algorithms. Inthis thesis, we demonstrate our approach with Garg’s k-TSP algorithm [19] that can beeasily transformed to quota variant.

4.2 Design space for action planning

In autonomous environmental modelling, the need to replan actions is typically lessfrequent compared to the required sampling interval. For example, scanning the bottomof a lake with sonar provides samples usually with 1 second interval; however, theneed to change a vessel’s course is less frequent. This has the benefit that the designspace A can be much smaller requiring less computational efforts. Respectively,indoor magnetic field SLAM benefits from dense sampling interval, whereas, actionreplanning can be restricted to the doorways, corners, corridor intersections and regulargrid points in open areas. This is illustrated in Fig 60 where we observe three sub-paths p1 = (x1,x2,x3,x4), p2 = (x4,x5,x6) and p3 = (x4,x7,x8,x9) when information isassumed to be independent of the path direction. As such, the design space for differentpaths consists of A= p1, p2, p3,(p1, p2),(p1, p3),(p2, p3). Now, the design matrix isgiven by

X(p1,p2) =

1 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 0

,

109

where we have assumed F is marginalized on basis (δx1 , . . . ,δx9) with Dirac deltafunctions δxi(x) = δ (x− xi).

•x1 •

x2u1

•x3

u2•x4

u3

•x5

u4

•x6

u5

•x7 •

x8 •x9

Fig 60. An example of design space. Sensing points are visualized with dots andpossible displacements between sensing points are visualized with dashed lines.Arrowed lines present control actions when moving along a designed sensingpath.

4.3 Vertex price generation

As stated in the previous section, we assume any possible action plan design consists ofconnected subpaths. Subpaths and connection points can be naturally described withsimple graphs where each subpath maps to an edge and connection point to a vertex.Sensing qualities over these edges are, in general, not independent, and in order toutilize the graph approach, we map the sensing qualities from each edge to the vertices.We begin with graphs where two nonadjacent edges are conditionally independent givenan adjacent edge, see Section 3.2. Let’s first consider the following graph G = (V,E)

visualized by

v1 v2 v3 v4

e1 e2 e3

consisting of conditionally independent edges. Now, for the total information, we have

q(e1,e2,e3) = q(e1)+q(e2|e1)+q(e3|e2)

= q(e3)+q(e2|e3)+q(e1|e2)

and through summation and division by 2, we have

q(e1,e2,e3) =12(q(e1)+q(e3)+q(e2|e1)+q(e1|e2)+q(e3|e2)+q(e2|e3))

110

which can be distributed among vertices, π(v1)=12 q(e1), π(v2)=

12 (q(e1|e2)+q(e2|e1)),

π(v3) =12 (q(e2|e3)+q(e3|e2)), π(v4) =

12 q(e3). That way, each vertex contains some

of the information contained in the incident edges. Next, let’s take a bit more complicatedgraph visualized by

v1 v2 v3 v4

v5

e1 e2 e3

e4

where we find out

q(e1,e2,e3,e4) = q(e1)+q(e2|e1)+q(e3,e4|e2)

= q(e3,e4)+q(e2|e3,e4)+q(e1|e2),

and for the vertices, we find out π(v1) =12 q(e1), π(v2) =

12 (q(e1|e2)+q(e2|e1)),

π(v3)=12 (q(e2|e3,e4)+q(e3,e4|e2)), π(v4)=

14 (q(e3)+q(e3|e4)), π(v5)=

14 (q(e4)+q(e4|e3)).

We notice that if edges e3 and e4 would be independent, the information for leaf nodesv4 and v5 would be computed the same way as for the v1. This procedure gives moreweight to the central nodes, that control the bigger amount of information, having thefollowing rationale: removing the central node, removes also all edges connected to it.

Next, we continue with cases where some non-adjacent edges are dependent. Let’sconsider the following graph

v1 v2 v3

v4

v5

v6

e1 e2

e3

e4

e5

where all incident edges, and also edges e1 and e3, and edges e3 and e5 are dependent.For the total information, we have now

q(e1,e2,e3,e4,e5) = q(e1)+q(e2|e1)+q(e3,e4|e2,e1)+q(e5|e3,e4)

= q(e5)+q(e4|e5)+q(e3|e4,e5)+q(e2|e3,e4)+q(e1|e2,e3),

111

that can be distributed among vertices by π(v1)=12 q(e1), π(v2)=

12 (q(e1|e2,e3)+q(e2|e1)),

π(v3)=12 (q(e2|e3,e4)+q(e3,e4|e2,e1)), π(v4)=

12 (q(e3|e4,e5)), π(v5)=

12 (q(e4|e5)+q(e5|e3,e4)),

π(v6) =12 (q(e5)). Since e1 and e5 are independent, it is natural v1 and v6 get more

weight (less conditioning) compared to v4. In general, depending on the case, there ismore than one way to arrange information form edges to vertices. In Definition 4.3.1,we give rules for a proper edge-vertex map.

Definition 4.3.1. Let us denote for a graph G = (V,E) by Ev a set of edges incident to

vertex v. Let us denote by q(e|e j, . . . ,el) a conditional edge quality, see Section 3.2, and

by π(v) a vertex price, created by mapping edge qualities to vertex v. Proper edge-vertex

mapping is defined by

1. If all edges incident to vertex v are independent, π(v) = 12 ∑e∈Ev q(e)

2. In any case π(v)≥ 12 ∑e∈Ev q(e|E)

3. In any case π(v)≤ 12 ∑e∈Ev q(e)

4. The total quality over graph must preserve, i.e., ∑V π(v) = q(e1, . . . ,e|E|)

In the definition, the third requirement naturally follows from the first one. Thesecond argument means, that the information cannot be less than the half sum of theconditioned edges, where, conditioning is over all edges in a graph. Algorithm 4.1present a method to create vertex prices following Definition 4.3.1. At line 2, thealgorithm selects among all vertices the vertex which has the smallest degree, at line 3,computes the vertex price, and at line 7, removes this vertex from the set of selectablevertices. At line 9, the algorithm creates from the set of selectable vertices, a set ofvertices adjacent to the selected vertex. At line 11, the algorithm selects from theseadjacent vertices the vertex which has the smallest degree. If the set of adjacent verticesis empty, at lines 13-14, the algorithm creates a set of unselected vertices adjacent toselected vertices, and at lines 15-16 selects the vertex with the smallest degree andindicates the vertex to which it is connected. At line 18, the algorithm creates an edgefrom the selected vertex to the set of previously selected vertices, and at line 19, removesthis edge from the set of unselected edges. Finally, at line 20, the algorithm computesthe vertex price, and between lines 21 and 24, initializes the next iteration. The pricecomputation at lines 3 and 20 is similar, when we observe that at the beginning there isno edge e and ES is empty. From the algorithm we observe that the loop between lines 9and 24 runs |V |−1 times, until each vertex has some price.

From the lines 3 and 20, we can deduce that the requirements 1, 2 and 3 hold. For

112

Algorithm 4.1 Vertex price formationInput: A graph (V,E) and edge quality function q

Output: Vertex price function π

1: E∗← E

2: v← argminv′∈V (deg(v′))3: π(v)← 1

2 q(Ev)

4: v+← v

5: ES← Ev

6: ET ← /07: V ∗←V \v8: while |V ∗|> 0 do9: V ′←v : v ∈V ∗∧ (v+,v) ∈ E

10: if |V ′|> 0 then11: v← argminv′∈V ′(deg(v′))12: else13: V+←V \V ∗

14: V ′←v : v ∈V ∗∧ (v+,v) ∈ E ∧ v+ ∈V+15: v← argminv′∈V ′(deg(v′))16: v+← v′ ∈V+ : (v′,v) ∈ E

17: end if18: e← (v+,v)

19: E∗← E∗ \e20: π(v)← 1

2 (q(e|E∗)+q(Ev \ES|ES)+q((Ev∩ES)\e|ET ))

21: v+← v

22: ET ← ET ∪ ((Ev∩ES)\e)23: ES← ES∪Ev

24: V ∗←V ∗ \v25: end while

113

the total information produced by the algorithm, we have

|V |

∑i=1

π(vi) =12

|V |−1

∑i=1

q(ei|E \ei−1,ei−2, . . .)

+12

|V |

∑i=1

q(Evi \ (Evi−1 ∪Evi−2 ∪·· ·)|Evi−1 ∪Evi−2 ∪·· ·)

+12

|V |

∑i=1

q((Evi ∩ (Evi−1 ∪Evi−2 ∪·· ·))\ei−1|((Evi−1 ∩ (Evi−2 ∪Evi−3 ∪·· ·))\ei−2)∪

((Evi−2 ∩ (Evi−3 ∪Evi−4 ∪·· ·))\ei−3)∪·· ·),

where e1, . . . is the sequence of edges the algorithm selects at line 11, and v1, . . . is thesequence of vertices the algorithm selects at lines 2, 11 and 15. For the different termswe find out

12

|V |−1

∑i=1

q(ei|E \ei−1,ei−2, . . .) =12

q(E)− 12

q(E \e1, . . . ,e|V |−1),

12

|V |

∑i=1

q(Evi \ (Evi−1 ∪Evi−2 ∪·· ·)|Evi−1 ∪Evi−2 ∪·· ·) =12

q(E)

and

12

|V |

∑i=1

q((Evi ∩ (Evi−1 ∪Evi−2 ∪·· ·))\ei−1|((Evi−1 ∩ (Evi−2 ∪Evi−3 ∪·· ·))\ei−2)∪

((Evi−2 ∩ (Evi−3 ∪Evi−4 ∪·· ·))\ei−3)∪·· ·)

=12

|V |

∑i=1

q(Evi ∩ (E \e1, . . . ,e|V |−1)|(Evi−1 ∩ (E \e1, . . . ,e|V |−1))∪

(Evi−2 ∩ (E \e1, . . . ,e|V |−1))∪·· ·)

=12

q(E \e1, . . . ,e|V |−1).

Summing these terms together, we find out that also requirement 4 holds.

4.4 Informative path planning algorithm

Our approach here is based on the idea presented by Ausiello et al. in [35], such that, byusing k-TSP algorithm with non-uniform prices for the vertices, we can obtain quotaTSP solution. We utilize here Garg’s 2-approximate k-MST algorithm [19], which can

114

be transformed into quota MST by setting for each initial component, i.e. singleton setsof vertices, potential γ π(v) where γ is the initial potential in Garg’s algorithm. In orderto transform this to 2-approximate quota TSP, we simply need to double the tree edges,see e.g. [36]. In order to guarantee that the generated route actually obtains requiredsensing quality, we require that procedure Q returns for potential γ− a tree with edgesEγ−, such that, q(Eγ−,Eγ−)<CI , and for potential γ+ a tree with edges Eγ+, such that,q(Eγ+,Eγ+,)≥CI .

The tour based on doubled edges might be inefficient, and if all information wasactually in vertices, we could safely shortcut the route using e.g. Daneiko’s exactalgorithm [37]. However, since our information is truly in the subpaths, this might resultinto infeasible tours. Our solution is to first shortcut the tour and then return some of theoriginal doubled edges. This is presented in the following subsection.

4.4.1 Informative shortcutting heuristic

Let us first assume that we have an infeasible shortcutted tour with edges Eo : q(Eo)<CI

and a feasible doubled-tree tour with edges Er : q(Er)≥CI . Our objective is to returnsome of the tree edges such that the length of the feasible tour would be minimal. Westart by creating longest continuous paths on edges that do not belong to the tree andconnect two tree vertices, and denote these by p1, . . . , pn. Next, using the endpointsu, v of each path pi, we create the shortest path ri using the edges of tree. These areillustrated in Fig 61.

Having tree and path edges, we define replacement function by Xi : pi→ ri. Theinformativeness of the replacement is given by

f (Xi) = q(Eo \ pi] ri),

where symbol ] denotes disjoint union that is used to indicate that the tree path edgesmay occur two times when replaced to the tour. Respectively, for a set of replacementsA⊆X = X1, . . . ,Xn, we have

f (A) = q(Eo \⋃

i:Xi∈A

pi]⋃

i:Xi∈A

ri). (32)

Now, we define the minimization problem as

minA⊆X

c(A)

subject to f (A)≥CI , (33)

115

r

Fig 61. Tree and tour paths. Solid lines represent tree edges that remained aftershortcutting, dotted lines represent tree edges that were removed after shortcut-ting and dashed lines the paths that were created after shortcutting. Red repre-sents path pi, whereas, blue path ri.where c(A) = ∑i:Xi∈A c(ri)− c(pi), that can be transformed into the problem

minA⊆X

c(A)+λ (CI− f (A)), (34)

where λ ≥ 0 can be interpreted as a set theoretic variant of a Karush-Kuhn-Tuckermultiplier. Next, we will show the conditions for problem (34) to find out a solution thatalso minimizes problem (33).

Proposition 4.4.1. We assume A∗ ⊆ X minimizes problem (33). Then A∗ minimizes

also problem (34) iff

λ <c(A)− c(A∗)f (A)− f (A∗)

∀A⊆X : c(A∗)< c(A)∧ f (A∗)≤ f (A),

and

λ >c(A∗)− c(A)f (A∗)− f (A)

∀A⊆X : c(A∗)> c(A)∧ f (A∗)≥ f (A).

Proof. We consider four possible cases;

1. c(A∗)< c(A)∧ f (A∗)≤ f (A),

116

2. c(A∗)> c(A)∧ f (A∗)≤ f (A),3. c(A∗)< c(A)∧ f (A∗)≥ f (A) and4. c(A∗)< c(A)∧ f (A∗)≤ f (A).

For the first case, we find out that A is a solution for problem (34) iff

c(A∗)+λ (CI− f (A∗)) < c(A)+λ (CI− f (A))

⇔ λ <c(A)− c(A∗)f (A)− f (A∗)

.

For the second case, we observe it is not possible, since, g(A∗) ≥ Ci and A∗ is theoptimal solution. For the third case, we observe

c(A∗)+λ (CI− f (A∗))< c(A)+λ (CI− f (A)) ∀λ ≥ 0

and for the fourth we obtain

c(A∗)+λ (CI− f (A∗)) < c(A)+λ (CI− f (A))

⇔ λ <c(A∗)− c(A)f (A∗)− f (A)

.

The conditions in proposition 4.4.1 state that, in many cases, minimizing problem(34) may not found the optimal solution, since, infimum for lambda is greater thansupremum. If we have a small supremum for lambda when conditioning c(A∗) <

c(A)∧ f (A∗)≤ f (A), this means that the information increment f (A)− f (A∗), whenselecting A instead of A∗, would be large and the cost increment c(A)−c(A∗) small. Sucha result would be also acceptable for us. Respectively, if the infimum for lambda whenconditioning c(A∗)< c(A)∧ f (A∗)≤ f (A) is large, the information loss f (A∗)− f (A),when selecting A instead of A∗, would be small and the cost benefit c(A∗)− c(A) large.In such a case, we would except that increasing λ slowly would quite soon bring f (A)

over CI , such that with A+ = argminA∈X c(A)+λ (CI− f (A)), the cost increment wouldbe small. This states that using e.g. binary search with different lambda values, weshould be able to find a threshold, such that, f (A+)≥CI and c(A+) is not too far awayfrom the optimal value c(A∗).

Minimization of the set function in problem (34) is challenging unless we havesome special properties for the function. For example, if we could prove that f issupermodular, from the linearity of c(A), we could deduce that the minimization problem

117

is submodular, for which, there exist exact polynomial time solutions, such as [38],[39]. However, the supermodularity (nor submodularity) of f cannot be, in general,guaranteed and it is not even plausible to assume so. Here, we use simple greedyheuristics to find out a local optimum for each λ .

4.4.2 Experiments

In the experiments, our objective was to demonstrate how our vertex price generationand short-cutting heuristic helps to improve sensing time in sensing quality constrainedenvironmental modelling. Our experimental environment consists of evenly spacedlocations, and a differential drivable robot that can rotate at each location and movestraight between these locations. For each location, there are eight states, one for eachpossible orientation. The translation speed is 0.25 m/s and rotation speed 36 degs/s,which provides the edge costs as a moving time. For the sensing quality, we are samplingwith a fixed sampling interval a stationary Gaussian process resulted from a convolutionof white noise with a Gaussian convolution kernel. The white noise variance is 1.2 andkernel width 0.45m, whereas, for the sensor noise we have 0.5.

In the first experiment, we considered informative path planning for a state spacewith two locations at one meter distance. For each location, we set up eight states eachrepresenting the robot at different orientations with 45 degrees intervals. Each state isconnected to the two rotationally closest states. Fig 62 shows vertex prices generatedusing algorithm 4.1. The edge costs (not visualized) are 4 for the edges connecting twolocations, and 1.25 for rotation edges. We see that the vertices connecting two gridshave obtained more price. Fig 63 shows the quality of the tree obtained with modifiedGoemans and Williamson algorithm for different initial potential γ . We observe that thesensing quality is monotonically increasing, as expected, in most parts, which supportsbinary search for the threshold γ values. Fig 64 and Fig 65 present obtained routequalities and costs, respectively, for the double-tree, shortcutted, and informativelyshortcutted routes. For double-tree route generation we utilized Garg’s quota TSPdescribed previously. For shortcutting we utilized Daneiko’s exact algorithm. Andfinally, for informative shortcutting, we utilized an edge-restoring approach described inSection 4.4.1. Fig 66 shows the generated paths using these vertex prices with differenttarget quality values. We observe with this simple two location case, double-treeroute generation already provides efficient solutions that cannot be shortcutted withoutlosing quality constraint. We also wanted to verify that our vertex price generation

118

is valid. We did this by comparing quota TSP results when using vertex prices fromalgorithm 4.1 with unit prices for all vertices. Fig 67 and Fig 68 present route qualitiesand costs, respectively, for different target qualities. We observe with vertex pricegeneration, the route costs are significantly smaller. This follows from the smaller pricesat uninformative rotation vertices, as shown in Fig 62.

2.55

2.06

0.37

0.30

2.85

0.24

0.24

0.27

1.34

0.26

0.24

0.23

3.08

0.25

0.28

0.69

Fig 62. Vertex price formation example.

Fig 63. Double-tree qualities for different gamma.

119

Fig 64. Obtained route qualities for different target qualities.

Fig 65. Route costs for different target qualities.

120

r

(a) Target quality = 5

r

(b) Target quality = 10

v

r

u

(c) Target quality = 15

Fig 66. Routes for different target qualities. Blue lines represent double-treeedges, red arrows shortcutted path, and green arrows informative shortcuttedpath. For target qualities 5 and 10 all paths have the same edges. For target qual-ity 15, informative shortcutting restored double-tree edges for the shortcut v→ u.

121

Fig 67. Double-tree route qualities using unit prices and generated prices. Weobserve that both these approaches can provide double-trees with informationclose to the target.

Fig 68. Double-tree route costs using unit prices and generated prices. We ob-serve that using vertex prices obtained with algorithm 4.1 provide significantlysmaller route costs.

122

In the second experiment, we used a more realistic state space consisting ofsixteen locations spaced evenly in a square area with one meter distances, where statesand connectivity for each location are the same as previously. Also the underlyingmeasurement model is the same and edge costs result from the same rotational andtranslational speeds. The robot can move between locations, such that every state isconnected to the closest state with the same orientation. That way the graph connectingthis state space consists of 128 vertices and 212 edges. Fig 69 presents qualities obtainedthrough modified Williamson and Goemans algorithm with different gamma values,and Fig 70 and Fig 71 route qualities and costs, respectively, for double-tree andshortcutted routes. Figure 72 presents routes with different target qualities. We observethat informative shortcutting helps to decrease sensing time in many cases. We werealso now interested to see if the vertex price generation provides any benefit for thiskind of open space route optimization. Fig 73 and Fig 74 show route qualities andcosts, respectively, for double-tree path with both generated prices and unit prices. Eventhough, in this state-space almost all vertices are equally connected to surroundingvertices, the benefit shown in Fig 74 is due to the sparsity in prices. Due to the residualinformation in vertex price generation, setting one vertex with high quality usuallymeans smaller prices to the adjacent vertices. This is also beneficial for the true quality,since the joint information between adjacent edges is usually much smaller than the sumof the edge information.

123

Fig 69. Double tree edge qualities for different gamma.

Fig 70. Route qualities for different target qualities.

124

Fig 71. Route costs for different target qualities.

125

r

(a) Target quality = 50.75

r

v

(b) Target quality = 101.5

Fig 72. Routes for different target qualities. Blue lines represent double-treeedges, red arrows shortcutted path, and green arrows informative shortcuttedpath. For target quality 50.75, shortcutting returns edges with a quality over re-quired. For target quality 101.5, in order to obtain enough information, informativeshort-cutting has returned a tree path between locations r and v. Respectively, fortarget quality 145, informative shortcutting has returned a tree path between loca-tions u and v.

126

r

v

u

(c) Target quality = 145.

Fig 72. Routes for different target qualities. Blue lines represent double-treeedges, red arrows shortcutted path, and green arrows informative shortcuttedpath. For target quality 50.75, shortcutting returns edges with a quality over re-quired. For target quality 101.5, in order to obtain enough information, informativeshort-cutting has returned a tree path between locations r and v. Respectively, fortarget quality 145, informative shortcutting has returned a tree path between loca-tions u and v (cont.)

127

Fig 73. Double-tree route qualities using unit prices and generated prices. Weobserve that both these approaches can provide double-trees with informationclose to the target.

Fig 74. Double-tree route costs using unit prices and generated prices. We ob-serve that using vertex prices obtained with Algorithm 4.1 provides significantlysmaller route costs.

128

With these experiments we demonstrated the quota TSP approach together with thevertex price generation to deal with spatial dependencies between sensing paths. If weadapt our models between each control action, we simply need to update vertex pricesand run quota TSP again. This also applies to if the sensing paths uncertainties areupdated. To incorporate sensing path uncertainties with the vertex prices needs somesimplifications introduced in the next section.

4.5 Path planning with sensing path uncertainties

In Section 3.3.1, we explained how to create sensing quality when the sensing path isuncertain. However, we notice that for an edge e, we usually have multiple differentfeasible tours where it occurs. Following from the motion model Xi+1 = M(Xi,ui) thismeans that also the marginal distribution

P(xe|zA) = ∑xe1 6=e

· · · ∑xen 6=e

P(xe1 6=e, . . . ,xei=e, . . . ,xen 6=e|zA),

where A = (e1, . . . ,en), depends on the design A. For some designs, P(xe|zA) distributionmay be very narrow, whereas, for some other, very wide, and to be able to work withthe quota TSP algorithm, we need to summarize P(xe|zA) from different designs. Tosimplify computations, we also make P(xe|zA) independent of surrounding edges, suchthat P(xe|xA\e,zA) = P(xe|ze). Now, the sensing quality over edges E can be given by

q(E) = h(F)− 1|E| ∑e∈E

∑xe

∑xE∗

∫ze

∫zE∗

h(F |Xe = xe,Ze = ze,XE∗ = xE∗ ,ZE∗ = zE∗)

· P(xe|ze)P(xE∗ |zE∗)p(ze)dze p(zE∗)dzE∗ −H(XE |ZE)+H(XE |ZE ,F)

where E∗ = E \ e. This is still computationally infeasible, however, we do our lastsimplification by the following approximation

∑xE∗

h(F |Xe = xe,Ze = ze,XE∗ = xE∗ ,ZE∗ = zE∗)P(xE∗ |zE∗)

≈ h(F |Xe = xe,Ze = ze,E[XE∗ ],ZE∗ = zE∗)

129

5 Conclusions

In this thesis, we proposed new methods to support the development of autonomyfor robotic systems that aim to optimize different environmental modelling tasks.We focused on three topics: first, how to enable models to adapt to the new sensorinformation, second, from the modelling perspective, how to define sensing qualityof different action plans, and third, how to plan actions, in a computationally feasiblemanner, that will reach the target quality in an approximately or nearly optimal manner.In the following, we summarize results for each topic and give some discussion forpossible improvements.

5.1 Model adaptation

Assuming the observed phenomenon is due to the Gaussian white noise process con-volved with location dependent anisotropic diffusion kernels, we developed an adaptationmethod that is able to find close to the correct kernel parameters in computationallyfeasible time. This adaptation was obtained by connecting local structure tensor tothe parameters of an anisotropic diffusion kernel. If the sensor noise is unknown ordiffusion kernel’s time-scale parameter is location dependent, we developed an iterativeupdate method that is able to converge close to the true values, starting with arbitraryinitial values. Developed methods were validated with randomly generated and real data.

We considered only models where the phenomenon is assumed to origin fromlocation and direction dependent diffusion of a Gaussian white noise process, as such,providing methods to model anisotropy and spatial heterogeneity. This is definitely notthe best way for many processes, however, as stated in the introduction, this is one of themost general way to express hererogeneity without too much a prior assumption of thephenomenon.

5.2 Quantifying sensing quality

When the objective is to minimize predictive uncertainty of the spatial model that isused to present the phenomenon under interest, we proposed to use mutual informationbetween the model and measurements. We proved that, in the case of Gaussian processes

131

or any other consistent estimator, for that matter, this mutual information can be derivedfrom the mutual information between selected and possible forthcoming observations,assuming the process is marginalized in the observation locations. We also provedthat when considering the expected predictive quality of the model, optimizing withrespect to this mutual information provides the best results. This mutual informationwas compared with experiments to two other commonly used entropy based sensingquality criteria. As expected, when considering the expected predictive quality, theproposed mutual information was best. However, for the mean squared error, this was,in many times, not the best solution, which was explained by the difference in trace anddeterminant of the posterior covariance.

When the sensing path is uncertain, we proposed a mutual information based sensingquality criterion where the posterior model is marginalized over sensing path distribution.The benefit is that this leads to the unified approach for sensing with and without sensingpath uncertainties. We proved that this approach actually also leads to solutions thatimplicitly minimize localization uncertainty, whenever it is necessary for predictivequality of the model. We compared the results with this sensing quality to the sensingquality that measures mutual information between joint distribution of model andsensing path, and measurements. We observed that, in many cases, our sensing qualityprovides similar results in decreasing sensing path uncertainty, and preferable results innot favoring, as largely, designs that have large a prior sensing path uncertainty.

With the sensing qualities, we did not consider if there could be situations where itwould be more rational to optimize directly against conditional entropy h(F |ZA). Ifwe knew exactly the target entropy we are aiming at this would make our approachesindependent of a prior assumptions or model adaptation. In practice, however, it is quiteclear and straight forward to define the target quality as a certain factor (e.g. 50 * areasize) of improvement in entropies.

5.3 Action planning

For action planning, we proposed to use graph approximation techniques to find outapproximately best sensing paths. In order to work with spatially dependent information,which is the usual and beneficial assumption in environmental modelling, we proposeda technique to map sensing quality from subpaths to the prices for each vertex. Thisvertex price mapping was demonstrated with simple experiments that showed the benefitof our approach compared to the cases where vertices were given unit prices. Finally,

132

in order to incorporate sensing path uncertainty with graph based path planning, weproposed several simplifications to present marginal distributions over sub-paths and towork with robotic motion models.

For path planning, we used here modified Garg’s 2-approximation approach, that iscurrently known as the best approximation algorithm for general k-MST problem. Forholonomic robots the path planning can be described in Euclidean coordinates. Onecould then try to use Arora’s approach [40] that is known to be PTAS for Euclideank-TSP. However, we don’t know is it possible to extend this PTAS property to Euclideanquota TSP. Instead of using approximate algorithms, one could utilize sampling basedsearch approaches that are guaranteed to converge asymptotically to the optimal solution.However, authors don’t know if search spaces could be pruned in quota problems suchthat they would allow computationally feasible solutions.

133

References

1. Cressie N & Wikle C (2011) Statistics for Spatio-temporal data. Wiley, New York, USA.2. Jaynes E (1962) Information theory and statistical mechanics. Statistical Physics pp. 181–218.3. Kemppainen A, Vallivaara I & Röning J (2015) Magnetic field slam exploration: Frequency

domain gaussian processes and informative route planning. In: Mobile Robots (ECMR),2015 European Conference on, pp. 1–7.

4. Higdon D, Swall J & Kern J (1999) Non-stationary spatial modeling. Bayesian Statistics 6,eds. J.M. Bernardo et al., Oxford University Press pp. 761–768.

5. Paciorek C (2003) Nonstationary gaussian processes for regression and spatial modelling.Ph.D. thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania.

6. Plagemann C (2008) Gaussian processes for flexible robot learning. Ph.D. thesis, Universityof Freiburg, Department of Computer Science.

7. Lang T, Plagemann C & Burgard W (2007) Adaptive non-stationary kernel regression forterrain modeling. In: Robotics: Science and Systems (RSS). Atlanta, Georgia, USA.

8. Kemppainen A, Mäkelä T, Haverinen J & Röning J (2009) An adaptive model for spatialsampling design. In: 4th European Conference on Mobile Robots.

9. Lecture Notes: 03. Bolzmann Entropy, Gibbs Entropy, Shannon Information. http://ls.poly.edu/~jbain/physinfocomp/lectures/03.BoltzGibbsShannon.pdf. Ac-cessed: 2016-12-21.

10. Kemppainen A, Haverinen J, Vallivaara I & Röning J (2011) Near-optimal slam explorationin gaussian processes: scalable optimality factor and model quality rating. In: EuropeanConference on Mobile Robots (ECMR), Örebro, Sweden.

11. Kemppainen A, Haverinen J, Vallivaara I & Röning J (2010) Near-optimal slam explorationin gaussian processes. In: IEEE 2010 International Conference on Multisensor Fusion andIntegration for Intelligent Systems (IEEE MFI 2010).

12. Chaloner K & Verdinelli I Bayesian experimental design: a review. Statistical Science 3(10):273—-304.

13. Krause A (2008) Optimizing sensing: Theory and applications. Ph.D. thesis, Carnegie MellonUniversity.

14. Wolsey LA (1982) An analysis of the greedy algorithm for the submodular set covering prob-lem. Combinatorica 2(4): 385–393. URI: http://dx.doi.org/10.1007/BF02579435.

15. Nemhauser G, Wolsey L & Fisher M (1978) An analysis of approximations for maximizingsubmodular set functions-I. Mathematical Programming 14(1): 265–294.

16. Singh A, Krause A & Kaiser W (2009) Nonmyopic adaptive informative path planning formultiple robots. In: Proc. International Joint Conference on Artificial Intelligence (IJCAI).

17. Singh A, Krause A, Guestrin C, Kaiser W & Batalin M (2009) Efficient informative sensingusing multiple robots. Journal of Artificial Intelligence Research (JAIR) 34: 707–755.

18. Blum A, Chawla S, Karger DR, Lane T, Meyerson A & Minkoff M (2007) Approximationalgorithms for orienteering and discounted-reward TSP. SIAM J. Comput. 37(2): 653–670.URI: http://dx.doi.org/10.1137/050645464.

19. Garg N (2005) Saving an epsilon: A 2-approximation for the k-mst problem in graphs. In:Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing, STOC’05, pp. 396–402. ACM, New York, NY, USA. URI: http://doi.acm.org/10.1145/

135

1060590.1060650.20. Chekuri C, Korula N & Pál M (2008) Improved algorithms for orienteering and related

problems. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Al-gorithms, SODA ’08, pp. 661–670. Society for Industrial and Applied Mathematics, Philadel-phia, PA, USA. URI: http://dl.acm.org/citation.cfm?id=1347082.1347155.

21. Chekuri C & Pal M (2005) A recursive greedy algorithm for walks in directed graphs. In:Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science,FOCS ’05, pp. 245–253. IEEE Computer Society, Washington, DC, USA. URI: http://dx.doi.org/10.1109/SFCS.2005.9.

22. Binney J & Sukhatme GS (2012) Branch and bound for informative path planning. In: 2012IEEE International Conference on Robotics and Automation, pp. 2147–2154.

23. Hollinger G & Sukhatme G (2014) Sampling-based robotic information gathering algorithms.International Journal of Robotics Research 33(9): 385–393.

24. Suh J, Cho K & Oh S (2016) Efficient graph-based informative path planning using crossentropy. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 5894–5899.

25. Ruiz AV, Wiedermann T, Manss C, Magel L, Muller J, Shutin D & Merino L (2015) A generalalgorithm for exploration with gaussian processes in complex, unknown environments. In:International Conference on Robotics and Automation (ICRA).

26. Stachniss C, Grisetti G & Burgard W (2005) Information gain-based exploration usingrao-blackwellized particle filters. In: Proceedings of Robotics: Science and Systems (RSS),pp. 65–72.

27. Kollar T & Roy N (2008) Efficient optimization of information-theoretic exploration in slam.In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 1369–1379.

28. Paciorek CJ & Schervish MJ (2004) Nonstationary covariance functions for gaussian processregression. In: In Proc. of the Conf. on Neural Information Processing Systems (NIPS. MITPress.

29. Middendorf M & Nagel HH (2002) Empirically convergent adaptive estimation of grayvaluestructure tensors. In: In DAGM-Symposium, pp. 66–74. Springer-Verlag.

30. Lindeberg T & Garding J (1997) Shape-adapted smoothing in estimation of 3-d depth cuesfrom affine distortions of local 2-d structure. Image and Vision Computing 6(15): 415–435.

31. Guestrin C, Krause A & Singh A (2005) Near-optimal sensor placements in gaussianprocesses. In: International Conference on Machine Learning (ICML).

32. Huber M, Bailey T, Durrant-Whyte H & Hanebeck U (2008) On entropy approximation forgaussian mixture random vectors. In: IEEE International Conference on Multisensor Fusionand Integration for Intelligent Systems.

33. Krause A, Guestrin C, Gupta A & Kleinberg J (2006) Near-optimal sensor placements:Maximizing information while minimizing communication cost. In: Proc. of InformationProcessing in Sensor Networks (IPSN).

34. Chaudhuri K, Godfrey B, Rao S & Talwar K (2003) Paths, trees, and minimum latency tours.2013 IEEE 54th Annual Symposium on Foundations of Computer Science 00: 36.

35. Ausiello G, Bonifaci V, Leonardi S & Marchetti Spaccamela A (2007) Prize-collectingtraveling salesman and related problems. Handbook of Approximation Algorithms andMetaheuristics, pp. 40.1–40.13. CRC Press.

36. Goemans MX & Williamson DP (1992) A general approximation technique for constrainedforest problems. In: Proceedings of the Third Annual ACM-SIAM Symposium on DiscreteAlgorithms, SODA ’92, pp. 307–316. Society for Industrial and Applied Mathematics,

136

Philadelphia, PA, USA. URI: http://dl.acm.org/citation.cfm?id=139404.139468.37. Deineko V & Tiskin A (2007) Fast Minimum-Weight Double-Tree Shortcutting for Metric

TSP, pp. 136–149. Springer Berlin Heidelberg, Berlin, Heidelberg. URI: http://dx.doi.org/10.1007/978-3-540-72845-0_11.

38. Schrijver A (2000) A combinatorial algorithm minimizing submodular functions in stronglypolynomial time. J. Comb. Theory Ser. B 80(2): 346–355. URI: http://dx.doi.org/10.1006/jctb.2000.1989.

39. Iwata S & Orlin JB (2009) A simple combinatorial algorithm for submodular function mini-mization. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algo-rithms, SODA ’09, pp. 1230–1237. Society for Industrial and Applied Mathematics, Philadel-phia, PA, USA. URI: http://dl.acm.org/citation.cfm?id=1496770.1496903.

40. Arora S (1998) Polynomial time approximation schemes for euclidean traveling salesmanand other geometric problems. J. ACM 45(5): 753–782. URI: http://doi.acm.org/10.1145/290179.290180.

137


Book orders:Granum: Virtual book storehttp://granum.uta.fi/granum/

S E R I E S C T E C H N I C A

634. Akram, Saad Ullah (2017) Cell segmentation and tracking via proposal generationand selection

635. Ylimäki, Markus (2017) Methods for image-based 3-D modeling using color anddepth cameras

636. Bagheri, Hamidreza (2017) Mobile clouds: a flexible resource sharing platformtowards energy, spectrum and cost efficient 5G networks

637. Heikkinen, Kari-Pekka (2018) Exploring studio-based higher education for T-shaped knowledge workers, case LAB studio model

638. Joshi, Satya Krishna (2018) Radio resource allocation techniques for MISOdownlink cellular networks

639. Shashika Manosha Kapuruhamy Badalge, (2018) Convex optimization basedresource allocation in multi-antenna systems

640. Koskela, Pekka (2018) Energy-efficient solutions for wireless sensor networks

641. Vuokila, Ari (2017) CFD modeling of auxiliary fuel injections in the blast furnacetuyere-raceway area

642. Vallivaara, Ilari (2018) Simultaneous localization and mapping using the indoormagnetic field

643. Kaparulina, Ekaterina (2018) Eurasian Arctic ice sheets in transitions :consequences for climate, environment and ocean circulation

644. Pramila, Anu (2018) Reading watermarks with a camera phone from printedimages

645. Leppänen, Teemu (2018) Resource-oriented mobile agent and softwareframework for the Internet of Things

646. Klets, Olesya (2018) Subject-specific finite element modeling of the knee joint tostudy osteoarthritis development and progression

647. Shams Shafigh, Alireza (2018) New networking paradigms for future wirelessnetworks

648. Mikhaylov, Konstantin (2018) Plug and play reconfigurable solutions forheterogeneous IoT

649. Verrollot, Jordan (2018) Mature supply management as an enabler for rapidproduct development and product portfolio renewal

C650etukansi.fm Page 2 Tuesday, March 13, 2018 2:42 PM

UNIVERSITY OF OULU P .O. Box 8000 F I -90014 UNIVERSITY OF OULU FINLAND


University Lecturer Tuomo Glumoff

University Lecturer Santeri Palviainen

Postdoctoral research fellow Sanna Taskila


University Lecturer Veli-Matti Ulvinen

Planning Director Pertti Tikkanen

Professor Jari Juga

University Lecturer Anu Soikkeli


Publications Editor Kirsti Nurkkala

ISBN 978-952-62-1850-2 (Paperback)ISBN 978-952-62-1851-9 (PDF)ISSN 0355-3213 (Print)ISSN 1796-2226 (Online)


TECHNICA


TECHNICA

OULU 2018

C 650

Anssi Kemppainen

ADAPTIVE METHODSFOR AUTONOMOUS ENVIRONMENTAL MODELLING

UNIVERSITY OF OULU GRADUATE SCHOOL;UNIVERSITY OF OULU,FACULTY OF INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING

C 650

AC

TAA

nssi Kem

pp

ainenC650etukansi.fm Page 1 Tuesday, March 13, 2018 2:42 PM

c 650 acta - university of oulujultika.oulu.fi/files/isbn9789526218519.pdf · professor jari juga...

Documents