thesis paper - deimurvdeim.urv.cat/...09_abu_yasin_mesisi_masterthesis.pdf · thesis paper...

Masters on Computer Engineering: Computer

Security and Intelligent Systems

Thesis Paper

Geo‐localization of an autonomous robot through the

information taken from a camera

Advisor Professor Dr. Francesc Serratosa

Casanelles

[email protected]

http://deim.urv.cat/~francesc.serratosa/

Submitted By Abu Sadat Mohammed Yasin

[email protected]

rv.cat

[email protected]

mailto:[email protected]







Index

Contents Page No.

Chapter 1: Introduction 01

Chapter 2: Algorithms 04

2.1 Matlab Feature Extractor 04

2.1.1 Speeded Up Robust Features (SURF) 04

2.2 Iterative Closest Point (ICP) 04

2.3 Distance Computation by Bipartite Algorithm (BP) 05

2.4 Earth Mover's Distance (EMD) 05

Chapter 3: Geo-localize Point Through Images 07

Chapter 4: Database 08

Chapter 5: Experiments, Results and Conclusion 10

5.1 Procedures/Experiments 10

5.1.1 Step One 10

5.1.2 Step Two 10

5.1.3 Step Three 11

5.1.4 Step Four 15

5.1.5 Step Five 17

5.2 Results and comparisons 18

5.3 Conclusion 25

References 26

1

Chapter 1: Introduction

The aim of this thesis is to define a methodology to localise the position of a camera from the given set of images taken from a scene. For instance, suppose we have some images of “SagradaFamilia” (Figure 1).

Figure 1: “SagradaFamilia” images as reference or database

We also know the position and orientation of the camera while taking these pictures. Then, we have an autonomous robot that has taken a new image (Figure 2) and now it want to know its position in the world.

Figure 2: Image taken by the autonomous robot's camera.

The aim of this thesis is provide a method that will help to obtain the current position of the autonomous robot that given this image. The method has only to use the information of the taken image, the images in the database and their geo-localization. Besides, it cannot use any implemented geo-localization system like GPS or Internet. The method has to look for the closest image and assume the autonomous robot’s position is the same (nearly same or very close) position labelled with the database’s image. The closest image is the one that obtains a minimum distance or maximum matching points or similarities with the autonomous robot's input image. Note that the reduction of run time is crucial while defining and implementing algorithms or methods because an autonomous robot cannot reduce its speed to have time to geo-localize.

2

Three methods Matlab Feature Extractor SURF, Iterative Closest Point-Bipartite (ICP-BP) and Earth Mover Distance (EMD) have been implemented and tried to justify the result in terms of minimum error, accuracy and time. At the beginning of this thesis what we had is, “SagradaFamilia” dataset, which is consists of 478 pictures (some are shown above in figure 1), 3D-points database in matlab ".mat" format and 2D-points database in matlab ".mat" format and another database named "camera_position.mat" contains every images camera positions 3D coordinate [1][2]. Note that, in this dataset there are two rounds of “SagradaFamilia” images. These databases are briefly described in the chapter 4: Database. We have extracted some salient 3D, 2D points and histograms for each image and then implemented these three methods. Matlab Feature Extractor SURF works on 2D points (Figure 3), it extract features information and find out the number of matching points or similarities between two images.

Figure 3: Salient 2D points of image 2889

Iterative Closest Point-Bipartite (ICP-BP) works on 3D points (Figure 4), ICP and BP together calculates the distance between two images.

Figure 4: Salient 3D points of image 2889

3

Earth Mover Distance (EMD) works on images histogram (Figure 5) and provide distance between two images.

Figure 5: Histogram of the gray scale image 2889

After implementation of SURF, ICP-BP and EMD, we have calculated the accuracy in the terms of minimum error, neighbour image position and time elapsed to get the output. All these staffs are briefly described in the chapter 5.

4

Chapter 2: Algorithms

This chapters contains the algorithms those are used in this thesis experiments. In this chapter, we just provides the overview of those algorithms.

2.1 Matlab Feature Extractor

Feature extraction is a type of dimensionality reduction that efficiently represents interesting parts of an image as a compact feature vector. This approach is useful when image sizes are large and a reduced feature representation is required to quickly complete tasks such as image matching and retrieval. Feature detection, feature extraction, and matching are often combined to solve common computer vision problems such as object detection and recognition, content-based image retrieval, face detection and recognition, and texture classification.

Common feature extraction techniques are Histogram of Oriented Gradients (HOG), Speeded Up Robust Features (SURF), Scale-invariant feature transform (SIFT), Local Binary Patterns (LBP), Haar wavelets, and colour histograms. SURF is the fastest algorithm among all of these feature extraction techniques and in our experiment we have used SURF feature extraction technique[3].

2.1.1 Speeded Up Robust Features (SURF)

SURF [4] is a robust local feature detector, The SURF algorithm is based on the Scale-invariant feature transform (SIFT) predecessor. The standard version of SURF is several times faster than SIFT. SURF is a detector and a high-performance descriptor points of interest in an image where the image is transformed into coordinates, using a technique called multi-resolution.

Algorithm [4]: SURF algorithm has the following steps.

1. Interest Point Detection I. Integral Images

II. Hessian Matrix Based Interest Points III. Scale Space Representation IV. Interest Point Localisation

2. Interest Point Description and Matching I. Orientation Assignment

II. Descriptor based on Sum of Haar Wavelet Responses III. Fast Indexing for Matching

2.2 Iterative Closest Point (ICP): ICP [5] is an algorithm employed to minimize the difference between two clouds of points. In this algorithm, one point cloud, the reference, or target, is kept fixed, while the other one, the source, is transformed to best match the reference. The algorithm iteratively revises the transformation (combination of translation and rotation) needed to minimize the distance from the source to the reference point cloud.

http://www.mathworks.com/products/computer-vision/features.html

http://www.mathworks.com/discovery/object-recognition.html

http://www.mathworks.com/discovery/face-recognition.html

5

ICP takes reference and source point clouds, initial estimation of the transformation to align the source to the reference (optional), criteria for stopping the iterations as input and then transform them into refined transformation.

Algorithm:

1. For each point in the source point cloud, find the closest point in the reference point cloud.

2. Estimate the combination of rotation and translation using a mean squared error cost function that will best align each source point to its match found in the previous step.

3. Transform the source points using the obtained transformation. 4. Iterate (re-associate the points, and so on).

2.3 Distance Computation by Bipartite Algorithm (BP) [6][7][8]: The key idea of graph matching is to define a dissimilarity measure for graphs or vectors. In contrast to statistical pattern recognition, where patterns are described by vectors, graphs do not offer a straightforward distance model like the Euclidean distance. A common way to define the dissimilarity of two graphs or vectors is to determine the minimal amount of distortion that is needed to transform one graph or vector into the other. These distortions are given by insertions, deletions, and substitutions of nodes and edges. Given two graphs or vectors – the source graph g1 and the target graph g2 – the idea is to delete some nodes and edges from g1, labelling some of the remaining nodes and edges (substitutions) and possibly insert some nodes and edges, such that g1 is finally transformed into g2. A sequence of edit operations that transforms g1 into g2 is called an edit path between g1 and g2. One can introduce cost functions for each edit operation measuring the strength of the given distortion. The idea of such cost functions is that one can define whether or not an edit operation represents a strong medication of the graph or vector. Hence, between two structurally similar graphs or vectors , there exists an inexpensive edit path, representing low cost operations, while for structurally different graphs or vectors an edit path with high costs is needed. Consequently, the edit distance of two graphs or vectors is define by the minimum cost edit path between two graphs or vectors . In the following we will denote a graph by g=(V, E, α, β), where V denotes a finite set of nodes, E ⊆ V × V a set of directed edges, α : V → LV a node labelling function assigning an attribute from LV to each node, and β : E → LE an edge labelling function. The substitution of a node u by a node v is denoted by u → v, the insertion of u by ε → u, and the deletion of u by u → ε. The edit distance can be computed by a tree search algorithm, where possible edit paths are iteratively explored, and the minimum-cost edit path can finally be retrieved from the search tree [8,18]. This method allows us to find the optimal edit path between two graphs or vectors .

2.4 Earth Mover's Distance (EMD): EMD is a method to evaluate dissimilarity between two multi-dimensional distributions in some feature space where a distance measure between single features, which we call the ground distance is given. For example, for two given distributions, one can be seen as a mass of earth properly spread in space, the other as a collection of holes in that same space. Then, the EMD measures the least amount of work needed to fill the holes with earth. Here, a unit of work corresponds to

6

transporting a unit of earth by a unit of ground distance [9]. The EMD "lifts" this distance from individual features to full distributions.

It is valid only if the two distributions have the same integral, as in normalized histograms or probability density functions. The EMD distance (ordinal distance) [8] between two histograms is the minimum of work needed to transform from one histogram to the other. Histogram H(A) of input image can be transformed into histogram H(B) of reference image by moving the elements to left or right and the total of all necessary minimum movements is the distance between them.

The distance between two histograms is define in Ref. [8] as follows:

𝐷𝑜𝑟𝑑 𝐻 𝐴 , 𝐻 𝐵 = (𝐻𝑗 𝐴 − 𝐻𝑗 𝐵 )

𝑖

𝑗 =1

𝑇−1

𝑖=1

The pseudo-code of EMD is described in Ref. [10]

7

Chapter 3: Geo-localize Point Through Images

Geo-localize is the identification of the real-world geographic location of an object, It is closely related to the use of geographic coordinate positioning systems such as a radar source, mobile phone or Internet-connected computer terminal.

For either geo-locating or positioning, the locating engine often uses radio frequency (RF) location methods, for example Time Difference Of Arrival (TDOA) for precision. TDOA systems often utilise mapping displays or other geographic information system. When a GPS signal is unavailable, Geo-localize applications can use information from cell towers to triangulate the approximate position, a method that is not as accurate as GPS but has greatly improved in recent years. This is in contrast to earlier radiolocation technologies, for example Direction Finding where a line of bearing to a transmitter is achieved as part of the process.

Internet and computer Geo-localize can be performed by associating a geographic location with the Internet Protocol (IP) address, MAC address, RFID, hardware embedded article/production number, embedded software number (such as UUID, Exif/IPTC/XMP or modern Steganography), invoice, Wi-Fi positioning system, device fingerprint, canvas fingerprinting or device GPS coordinates, or other, perhaps self-disclosed information. Geo-localize usually works by automatically looking up an IP address on a WHOIS service and retrieving the registrant's physical address. [11]

IP address location data can include information such as country, region, city, postal/zip code, latitude, longitude and time zone. Deeper data sets can determine other parameters such as domain name, connection speed, ISP, language, proxies, company name, US DMA/MSA, NAICS codes, and home/business.

For an autonomous robot, geo-localizing itself without any help or access of outside or any other technology or system like RF, GPS, Mobile, Internet and IP address etc is a new invention for us. The robot has one 3D camera, one 2D camera and an specific locations like “SagradaFamilia” databases consists of surrounding images, 3D, 2D coordinates and all the camera positions of these images. This system is offline processing and reasonable fast enough to get or find out its own position. In the next two chapters we are going to describe all the databases and our newly implemented system or approaches and experiments.

http://en.wikipedia.org/wiki/Latitude

http://en.wikipedia.org/wiki/Longitude

8

Chapter 4: Database

The database was created as follows. We used a sequence of 360 degree 2D-pictures taken of “SagradaFamilia” church in Barcelona (Spain). We have total 478 pictures and among them there are two rounds of 360 2D-pictures are available where the first round pictures or sequence were taken in an increase of approximate one degree with respect to the centre of the church and second round pictures or sequence were taken in an increase of more than one degree with respect to the centre of the church. Given the whole sequence, we used the method called Bundler [12] to extract 100,532 3D-points of the church and the information of which 2D-pictures visualise these 3D points. Each picture has captured from 4,000 to 40,000 3D-points. Moreover, the method returns the relation between the 3D-points and the position in pixels in the pictures. Then, the positions of the cameras were deducted by the pose estimation method presented in [13].

Figure 6 shows the obtained 3D model of “SagradaFamilia” (red points), and the different poses of the camera that has captured the images of the model (blue points). Axes are expressed in meters and the centre of the church is the origin of the coordinate system. Note there are some noisy points in the sky.

Figure 6. “SagradaFaimlia” point cloud (red points) and the camera poses (blue points) in a 3D coordinate system (in meters) in which the origin of the axes is the centre of the church.

We have three database file in matlab matrix.

The first matrix consists of 3D-points in matlab "*.mat" format with size 3x100,532 where

1. 3 represent the 3D coordinates (u, v, w) 2. 100,532 represents the total number of 3D-points points of the church.

9

The Second matrix consists of 2D information with size 102x3x100,532 saved in matlab "*.mat" format, Here

1. 102 represents that every images are displaying some 3D points and those 3D points are also displaying by some 2D images (maximum 102 images) of the church.

2. 3 represent (i, x, y), here i is the picture number value varies from 1 to 478 (total no of pictures) and (x, y) are the 2D coordinates

3. 100,532 represents the total number of 3D-points.

The third and last matrix consists of camera positions with size 478x3, here 1. 478 represents the total number of images 2. 3 represents the camera 3D coordinates (u, v, w), which (the camera) is used to

capture these pictures or images.

In our experiment, for testing we have considered the odd numbered images (from 1 to 478) as input and even numbers (from 1 to 478) images as reference or database.

10

Chapter 5: Experiments, Results and Conclusion

5.1 Procedures/Experiments

We did our experiment in five steps.

5.1.1 Step One: At first we had the all the images in JPG format but we have converted all the images in PGM format by using a software called IrfanView, we did this because Matlab SURF works on gray scale images only and EMD performs faster in gray scale image. The 3D-points database is a matrix with size 3x100532, working with 478 pictures and all the 100532 3D-points is very much time consuming. In this 3D-points database there are a lot of unnecessary or unwanted or noise points and we wanted to reduce them and wanted to work on only important or salient points. In our first step we used the following algorithm to get the salient points, this algorithms generates a file named "salients.mat" and this file has a matrix with size 100532x1, every rows is specific for the 100532, 3D-points. Salient matrices each cell have value 1 if the point is important or salient otherwise 0.

Algorithm:

Max = 0.6 , this the threshold.

salients = array of matlab ones with size (length/number of 3D-ponits)

For i=1 to all 3D-ponits

for j=i+1 to all 3D-ponits

if(Euclidean norm(3D-points(i)) - Euclidean norm(3D-points(j))< max)

salients(j) = 0;

end of if

end of for

end of for

save salients in matlab .mat format

5.1.2 Step Two: In this step for every images we have generated individual databases and saved them in matlab "pictures_label.mat" format. Every single image database contains all information of a specific one images for every salient 3D points equivalent 2D, 3D information and their feature extractions values. Each image/pictures database contains (n x72) sizes matrix, where n varies from 0 to 100532 and the matrix columns contains values according to table 1.

Column Values/Information

1 Every rows 1st column is the identifier/Salient point no of 3D-points from1~100532

2 and 3 2nd and 3rd column are 2D equivalent (x, y) coordinate values

11

4 to 6 4th, 5th and 6th positions are the 3D equivalent (u, v, w) coordinates

7 and 8 7th and 8th are the 2D equivalent coordinate (x', y') with respect to image size where image's centre pixel is the (0,0) position of the X and Y axis's

9 to 72 in the positions 9 to 72 are the matlab feature extraction values (64 values)

Table 1 Individual Image Database.

Here is the algorithm used are given below:

Algorithm:

matrix p

for i = every images

for j = every 3D points in 2D information database

if j is a salient point

get the 2D (x, y, z), 3D (u, v, w), 2D (x', y') and SURF feature extraction points.

add them in a new row in the matrix p.

end of if

end of for

end of for

save the matrix named after the image name in matlab ".mat" format

5.1.3 Step Three: In this step we have used odd numbered images (total 239) as input or testing image and even numbered images are as reference image. We have calculated the number of matching points or similarities between two images and generated a matrix based on matlab SURF. Indeed, We have made two version algorithm in this step.

In the first version we have generated a matrix with size 239x288 where 239 rows represents 239 outputs according to input image and the columns are contains values according to the following table 2.

Column Values/Informations

1 to 239 Contains the number of matched points or similiar points between input image and reference images.

240 Contains the first maximum matching points or similiar points or values from column 1 to 239

241 Contains the identifier (from 1-478, even numbers only) of the reference image whose number of matching points or similiarities with input image is maximum or equal to the values of column 240.

242 Contains the label/index number according to the reference (column 241) image name

243 to 245 Contains the reference images (column 241) camera position u, v, w values.

246 Contains the second maximum matching points or similiar points or values from column 1 to 239.

12

247 Contains the identifier (from 1-478, even numbers only) of the reference image whose number of matching points or similiarities with input image is second maximum or equal to the values of column 246.

248 Contains the label/index number according to the reference (column 247) image name.

249 to 251 Contains this reference images (column 247) camera position u, v, w values.

252 Contains the third maximum matching points or similiar points or values from column 1 to 239.

253 Contains the identifier (from 1-478, even numbers only) of the reference image whose number of matching points or similiarities with input image is third maximum or equal to the values of column 252.



258 Contains the fourth maximum matching points or similiar points or values from column 1 to 239.

259 Contains the identifier (from 1-478, even numbers only) of the reference image whose number of matching points or similiarities with input image is third maximum or equal to the values of column 258.



264 Contains the average between first maximum matching points or similiarity points, its equal to (column 240/1).

265 to 267 Contains the average 3D positions of the first maximum matching points, these are equal to column 243/1, column 244/1 to column 245/1

268 Contains the error or Euclidian distance between the input images camera position (3D coordinates) and average 3D positions (Column 265,266 and 267).

269 Contains the average between first two maximum matching points or similiarity points, its equal to ([column 240 + column 246]/2).

270 to 272 Contains the average 3D positions of the first maximum matching points, these are equal to column (243+249)/2, column (244+250)/2 and column (245+251)/2

273 Contains the error or Euclidian distance between the input images camera position (3D coordinates) and average 3D positions (Column 270, 271 and 272).

274 Contains the average between first two maximum matching points or similiarity points, its equal to ([column 240 + column 246 + column 252]/3).

275 to 277 Contains the average 3D positions of the first three maximum matching points, these are equal to column (243+249+255)/3, column (244+250+256)/3 and column (245+251+257)/3


279 Contains the average between first two maximum matching points or similiarity points, its equal to ([column 240 + column 246 + column 252 +

13

column 258]/4).

280 to 282 Contains the average 3D positions of the first four maximum matching points, these are equal to column (243+249+255+261)/4, column (244+250+256+262)/4 and column (245+251+257+263)/4


284 Contains the minimum error between these four error or euclidian distance (between column 268, 273, 278, 283)

285 Contains the number (1 to 4) of averages which average is the best with minimum error (values from 1 to 4).

286 to 288 Contains the position (3D coordinates) of this best average.

Table 2. SURF experiment one generated matrices column description.

first version algorithm:

Martix Max_Match_Matrix

For all input images (odd numbered images from 1-478)

read the input images camera position from camera_position.mat

read the input images database from the named after the "image name.mat"

Matchs array

For all reference images (even numbered images from 1-478)

read the reference images database from the named after the "image name.mat"

read feature extraction values from input images database.

read feature extraction values from reference images database.

get the number of matching using these two feature extracted values by using matlab SURF

add this number of matching point in the array Matches

End of for

find out the four maximum number of matching values.

find the four maximum number of points images identifier from (1-478)

find the four maximum number of points images index or label

find the four errors or Euclidian distance between input image camera position and resultened maximum number of points images camera position.

find out average of one max matching values.

find the average position of the one max matching values.

find the error or Euclidian distance for this position.

14

find out average of two max matching values.

find the average position of the two max matching values.


find out average of three max matching values.

find the average position of the three max matching values.


find out average of four max matching values.

find the average position of the four max matching values.


find the minimum error among these four error.

find which average is the best with minimum error (values from 1 to 4)

find this best average matching points average 3D position.

add all of these values into a new row of matrix Max_Match_Matrix

End of for

save the matrix in matlab .mat format.

The second version of this algorithm is same but here we take only one maximum match's and it produces an matrix too. This matrix size is 239x246 where 239 rows represents 239 outputs according to input image and the columns are contains values according to the following table 3.


1 to 239 Contains the number of matched points or similiar points between input image and reference images.

240 Contains the maximum matching points or similiar points or values from column 1 to 239

241 Contains the identifier (from 1-478, even numbers only) of the reference image whose number of matching points or similiarities with input image is maximum or equal to the values of column 240.



246 Contains the error or Euclidian distance between the input images 3D coordinates of camera position and the maximum matched reference images 3D coordinates

Table 3. SURF experiment two generated matrices column description.

15

Second version algorithm:

Martix Max_Match_Matrix

For all input images (odd numbered images from 1-478)

read the input images camera position from camera_position.mat

read the input images database from the named after the "image name.mat"

Matchs array

For all reference images (even numbered images from 1-478)

read the reference images database from the named after the "image name.mat"

read feature extraction values from input images database.

read feature extraction values from reference images database.

get the number of matching using these two feature extracted values by using matlab SURF

add this number of matching point in the array Matches

End of for

find out the maximum number of matching values.

find the maximum number of points images identifier from (1-478)

find the maximum number of points images index or label

find the error or Euclidian distance between input image camera position and resultened maximum number of points images camera position.

add all of these values into a new row of matrix Max_Match_Matrix

End of for

save the matrix in matlab .mat format.

5.1.4 Step Four: This steps also have two versions like step three and it takes the same input as step three. In this steps we have calculated the distances between two images. Indeed, we used the same algorithm of step three but instead of SURF here we used ICP and BP and instead of maximum we took minimum values of distances.

To calculate distances we use Iterative Closest Point (ICP) and Bipartite (BP) distance calculation algorithm both together. Here ICP iteratively revises the transformation (combination of translation and rotation) needed to minimize the distance from the source to the reference image position (3D coordinates) and then BP is used to calculate the distance between input and reference image.

First version has generated a matrix with size 239x288, where 239 rows represents 239 outputs according to input image and the columns are contains values according to the following table 4.

16


1 to 239 Contains the distance between input image and reference images.

240 Contains the minimum distance values from column 1 to 239

241 Contains the identifier (from 1-478, even numbers only) of the reference image whose distance with input image is minimum or equal to the values of column 240.



246 Contains the second minimum distance values from column 1 to 239.

247 Contains the identifier (from 1-478, even numbers only) of the reference image whose distance with input image is second minimum or equal to the values of column 246.



252 Contains the third minimum distance values from column 1 to 239.

253 Contains the identifier (from 1-478, even numbers only) of the reference image whose distance with input image is third minimum or equal to the values of column 252.



258 Contains the fourth minimum distance values from column 1 to 239.

259 Contains the identifier (from 1-478, even numbers only) of the reference image whose distance with input image is fourth minimum or equal to the values of column 258.



264 Contains the average between first minimum distance, its equal to (column 240/1).

265 to 267 Contains the average 3D positions of the first distance, these are equal to column 243/1, column 244/1 to column 245/1

268 Contains the error or Euclidian distance between the input images camera position (3D coordinates) and average 3D positions (Column 265,266 and 267).

269 Contains the average between first two minimum distances, its equal to ([column 240 + column 246]/2).

270 to 272 Contains the average 3D positions of the first minimum distances, these are equal to column (243+249)/2, column (244+250)/2 and column (245+251)/2


274 Contains the average between first two minimum distances, its equal to ([column 240 + column 246 + column 252]/3).

17

275 to 277 Contains the average 3D positions of the first three minimum distances, these are equal to column (243+249+255)/3, column (244+250+256)/3 and column (245+251+257)/3


279 Contains the average between first two minimum distances, its equal to ([column 240 + column 246 + column 252 + column 258]/4).

280 to 282 Contains the average 3D positions of the first four minimum distances, these are equal to column (243+249+255+261)/4, column (244+250+256+262)/4 and column (245+251+257+263)/4


284 Contains the minimum error between these four error or euclidian distance (between column 268, 273, 278, 283)

285 Contains the number (1 to 4) of averages which average is the best with minimum error (values from 1 to 4).

286 to 288 Contains the position (3D coordinates) of this best average.

Table 4. ICP-BP experiment one generated matrices column description.

Second version is same as step three, but we used ICP, BP to calculate the distances and we have taken only one minimum distances and then calculated all the distances, positions and errors and saved in a matrix with size 239x246 where 239 rows represents 239 outputs according to input image and the columns are contains values according to the following table 5.


1 to 239 Contains the number of distances between input image and reference images.

240 Contains the minimum distance value from column 1 to 239

241 Contains the identifier (from 1-478, even numbers only) of the reference image whose distance with input image is minimum or equal to the values of column 240.



246 Contains the error or Euclidian distance between the input images 3D coordinates of camera position and the maximum matched reference images 3D coordinates

Table 5. ICP-BP experiment two generated matrices column description.

5.1.5 Step Five: This step also takes same input and produced another matrix like step three and four. In this step, to calculate distances we have used Earth Mover Distance (EMD). The algorithm is same as step four but instead of reading input and reference images database we read the images directly and take image histogram as input and

18

reference. We did two experiments and save two matrices with size 239x288 and 239x246 and both of these matrices contains values according to table 4 and table 5.

5.2 Results and comparisons

We have six matrices

1. Two matrices with maximum matching points or similarities by SURF 2. Two matrices with minimum distance by ICP-BP 3. Two matrices with minimum distance by EMD.

When we have displayed the matrices (rows as input image and columns as reference image, 1 to 239) in image view then we can see them like figure 8, 9 and 10. In the graphical view we have plotted these matrices values in the range (0 to 1) but actually in the matrices values are very bigger than this range (0 to 1). To convert these three matrices values in this range (0 to 1) we have reduces the matrices each cell values by divide with the means of that matrix.

reduced matrix = matrix / (mean (mean (matrix)))

In Figure 7, 8 and 9 white colour represents the maximum values and black represents the minimum values. Figure 7 represents the matching or similarities points calculated by SURF, figure 8 and 9 represents the distances calculated by ICP-BP and EMD.

Figure 7. Graphical View of (Column 1 to 239) Matching Points or Similarities Matrix by SURF

19

Figure 8. Graphical View of (Column 1 to 239) Distance Matrix by ICP-BP

Figure 9. Graphical View of (Column 1 to 239) Distance Matrix by EMD.

20

In all of these three images, we can clearly see that every images have diagonal marking with red colour line and label "A". In the figure 7, the diagonal with label "A" represents the maximum matching or similarities points between 239 input and 239 reference images. In the figure 8 and 9, the diagonal with label "A" represents the minimum distances between 239 input and 239 reference images. Remarks, we have used all the odd numbered images as input and all the even numbered images as reference from the "SaragadaFamilia" dataset.

Besides label "A" there are two more interesting lines labelled as "B" and "C", we can clearly see them in the figure 7 and 8. We have already mentioned that we have taken total 478 pictures of “SagradaFamilia” church in Barcelona (Spain), and among them there are two rounds of 360 2D-pictures are available. These two lines ("B" and "C") appeared because of this second round pictures of “SagradaFamilia”. Because of second round images of “SagradaFamilia”, every input or source images have two very close matching reference image. In figure 9 also have two more similar lines like "B" and "C" of figure 7 and 8 but actually they are hidden in the image because we have reduced the matrix values.

For example, when we took "resized_IMG_2989.jpg" (51th image out of 478, figure 10[A]) as input then we got two very closely matched images from the dataset and they are "resized_IMG_2988.jpg" (50th image out of 478, figure 10[B]) and "resized_IMG_4050.jpg " (203th image out of 478, figure 10[C]). If we observe this two reference images from the figure 10(B) and 10(C) then we can see they are really very close or nearly same.

A B C

Figure 10. For input Image A there are two close image B and C from dataset

Figure 11, 12 and 13 are displaying one rows of every matrices of input image (51th image out of 478, figure 10[A]) and we have plotted these three figure. Where figure 11

21

represents the matches or similarities between input image 50 and all reference images calculated by SURF, here we can see there are two pick values marked as red circle. These two picks represents the above examples two reference images figure 10(B) and 10(C).

Figure 11. Graph for input image (51th image out of 478, figure 10[A]) and all the reference images (Column 1 to 239) of Matching Points or Similarities Matrix by SURF

A

B

C

Figure 12. Graph for input image (51th image out of 478, figure 10[A]) and all the reference images (Column 1 to 239) of Distance Matrix by ICP-BP

Figure 12 represents the distances between input image 50 and all reference images calculated by ICP-BP. If we zoom out the figure 12(A) then we can see there are two minimum values in two separate position like figure 12(B) and 12(C).

22

Figure 13 represents the distances between input image 50 and all reference images calculated by EMD. If we closely observe this figure 13 then we can see there are two pick minimum values.

Figure 13. Graph for input image (51th image out of 478, figure 10[A]) and all the reference images (Column 1 to 239) of Distance Matrix by EMD

Figure 11, 12 and 13 proves that our implemented approach provides good results and it can detect the minimum distanced or maximum matched images preciously for every round of taken images (Especially SURF in figure 11).

Now, we have summed up the number of minimum error (Euclidian distance) obtained by first (1N), second (2N), third (3N) and fourth (4N) average from every matrices column 285 (look at table 2 or 4) and displays the number of minimum error obtained in Figure 14. We have given priorities high from 1N to 4N. Before the practical result we have assumed/thought that we will get maximum for 2N but the result practically disappoint us and that's why we have implemented every algorithms second version. However from figure 14 we can say that SURF provides best result because for 1N and 2N SURF can provide maximum good outputs.

Figure 14. Accuracy in terms of minimum errors.

23

Now, question is "Why average positions do not provide best or good result all time?" and we think if input image is situated in the middle between two maximum matched or minimum distanced images (Figure 15) then this average techniques provide good result but if the input image does not situated in the middle of two maximum matched or minimum distanced images (Figure 16) then this average techniques does not provide good result. We think in our experiments results most of input images does not situated in the middle of two or more than two maximum matched or minimum distanced images.

Figure 15. Assumed input image in the middle of two maximum or minimum images

Figure 16. Assumed input image besides (not in the middle) of two maximum or minimum images

In the second version of every algorithm we have only took the first maximum matching points or minimum distances and then calculated their accuracy in terms of neighbour image (Figure 10). Neighbour image means the next and previous images of input image from the same dataset, In our experiment, from the 478 images of "SaragadaFamila" dataset, for an input image i the neighbour images are i+1, i-1, i+2 and i-2. (Figure 17).

i-3 i-2 i-1 i i+1 i+2 i+3 i+4

Figure 17. Neighbour images of image i

24

Indeed, in our experiment we already know all the images position and their neighbours. For an input image we have checked, if the resulted image id is in the neighbour position or not. We have summed up the total number of the resulted or output image id if it is in the neighbour position i-1, i+1, i-2 and i+2 (Figure 17). After calculation, we have found that SURF provides maximum correct results (Figure 18).

Figure 18. Accuracy in terms of neighbour reference images.

We have also calculated the time elapsed by each algorithm by using matlab tic toc, hence for the first ten input we have found the following elapsed times in seconds (table 6). From the table 6 we can clearly see that SURF is very faster than others.

These algorithms are implemented, tested and elapsed time are calculated in a laptop with Processor: Intel(R) Core(TM) i5-3210M CPU @2.50GHz 2.50GHz, Installed memory (RAM): 8.00 GB (7.60 GB usable), System type: Windows Seven Professional 64-bit Operating System.

SURF (seconds)

ICP_BP (seconds)

EMD (seconds)

2.4961

1.0221

3.4673

2.0008

1.2311

1.1812

1.3165

1.1879

1.2721

1.3390

1.40164 (avg)

281.9537

56.6271

943.5975

202.7352

83.5315

84.7508

100.7686

82.3641

86.8570

132.6327

205.58182 (avg)

15.1807

11.5964

11.6985

11.6312

11.5829

11.2573

11.0771

11.2365

11.2313

11.2215

11.77134 (avg)

Table 6. Time elapsed by each algorithms for 10 inputs

25

Hence, according to all the analysis we can say, second version of step three of the chapter 4 (SURF) is the best approach to find out an autonomous robots position.

5.3 Conclusion In our experiment we presented several methods SURF, ICP-BP and EMD and their results and comparisons. According to accuracy and time SURF is the best techniques to find out the position of an autonomous robot from picture database and in this chapters step three's second version algorithm or methods is the perfect approach to solve or find out this autonomous robots position. Maximum points matched images position is the position of this robot who takes this input image. Moreover, database or dataset should contain at least 360 or more images in one round for good results, it would be better if we take more than 360 images in one round.

26

References

1. A. Garrell & A. Sanfeliu, Cooperative social robots to accompany groups of people. International Journal of Robotics Research, 31(13), 1675–1701, 2012.

2. X. Cortés & F. Serratosa, An Interactive Method for the Image Alignment problem based on Partially Supervised Correspondence, Expert Systems With Applications 42 (1), pp: 179 - 192, 2015.

3. Matlab Feature Extraction http://www.mathworks.com/discovery/feature-extraction.html

4. Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, "SURF: Speeded Up Robust Features", Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346--359, 2008

5. Besl, Paul J.; N.D. McKay (1992). "A Method for Registration of 3-D Shapes". IEEE Trans. on Pattern Analysis and Machine Intelligence (Los Alamitos, CA, USA: IEEE Computer Society) 14 (2): 239–256.

6. Kaspar Riesen, Michel Neuhaus, and Horst Bunke, “Bipartite Graph Matching for Computing the Edit Distance of Graphs”, Proceedings of 6th International Workshop on Graph Based Representations in Pattern Recognition, LNCS 4538, 2007, pp. 1–12.

7. K. Riesen, H. Bunke, "Approximate graph edit distance computation by means of bipartite graph matching", Image Vision Comput., 27 (7) (2009), pp. 950–959

8. F. Serratosa and X. Cortes, "Edit Distance Computed by Fast Bipartite Graph Matching", pages 253 262, 2014

9. Y. Rubner, C. Tomasi, and L. J. Guibas. “A metric for distributions with applications to image databases”. IEEE International Conference on Computer Vision, pages 59-66, January 1998.

10. F. Serratosa and A. Sanfeliu, "Signatures versus Histograms: Definitions, Distances and Algorithms," Pattern Recognition, vol. 39, no. 5, pp. 921-934, 2006.

11. King, Kevin F., Geolocation and Federalism on the Internet: Cutting Internet Gambling’s Gordian Knot (July 14, 2009). Columbia Science and Technology Law Review, Vol. XI, 2010. Available at SSRN: http://ssrn.com/abstract=1433634

12. N. Snavely, S. Todorovic, “From contours to 3D object detection and pose estimation”, International Congress on Computer Vision, 2011.

13. A. Rubio et. al., “Efficient monocular pose estimation for complex 3D models”, accepted for publication in International Congress on Robotics and Automation, 2015.

http://ssrn.com/abstract=1433634

thesis paper - deimurvdeim.urv.cat/...09_abu_yasin_mesisi_masterthesis.pdf · thesis paper...

Documents