efficient dense optical flow estimation from...

Efficient Dense Optical Flow

Estimation from Two-Images

Qiang Duan

u5541596

Supervised by:

Dr. Yuchao Dai

A thesis submitted in partial fulfilment of the degree of

Bachelor of Advanced Computing (Honours) at

The Research School of Computer Science

Australian National University

October 2016

The Australian National University | 1

Acknowledgements:

I would like to convey my highest respect and deepest gratitude to everyone who helped

me to complete this individual project and report.

Above all, I would like to thank my supervisor, Dr. Yuchao Dai, for his clear

instructions, rigorous manner and patient guidance in both my academic research and

report writing. In the very beginning, he gave me a detailed introduction about this

project and helped me to analyse how to the algorithm should be implemented. As the

project proceeded, he kept encouraging me to improve the quality of the

implementation and conduct further research. This positive and rigorous academic

attitude will certainly help me perform more thorough academic research in the future.

I would also like to give a special thanks to Dr. Dingfu Zhou who gave me a tutorial in

a necessary module of my project, which helped me to generate the visual and dense

results.

Last but not least, I would like to thank the Course convener, Weifa Liang, who

provided us with the tutorial room and taught us about presentations and report writing.

In addition, class feedback on the mid-term presentations was helpful in improving our

presentation skills.

Declaration:

Except where otherwise indicated, this project is entirely my own work.

Qiang Duan

27th October 2016


Abstract:

Optical flow estimation is a key component of many computer vision tasks. Aiming to

build per-pixel correspondences between two images, the dense optical flow estimation

has broad applications in this field. This paper introduces the MATLAB

implementation of a dense optical flow estimation by using a coarse-to-fine fashion and

PatchMatch matching method, which is referred to as CPM (Coarse-to-fine

PatchMatch). This method is inspired by the nearest neighbour field method and also

incorporates the efficient random search strategy described in the PatchMatch

algorithm with coarse-to-fine fashion for optical flow. In addition, by using the hybrid

programming techniques of C and MATLAB, the efficiency of this method is enhanced

as much as possible. The implementation of this method verifies and validates the

efficiency and accuracy of its performance. Unlike other techniques that are noisy and

lack accuracy in terms of large displacement optical flow, such as NNF, the CPM

method uses a propagation step with a random search strategy on each level to decrease

noise. In addition, the coarse-to-fine scheme allows the tiny structures to be extracted

level by level in order to improve the performance in large displacement situations.

Furthermore, this method uses a seed set at specific positions in frames to represent a

set of adjacent pixels to achieve high efficient optical flow estimation and then uses an

edge-preserving interpolation method (EpicFlow) to realise dense optical flow

estimation. The results are then compared with original CPM-flow and other related

methods. As existing knowledge is not enough to accelerate the MATLAB code and

overcome the inherent limitations of MATLAB, my implementation is slower than the

original but generates acceptably accurate results.


Contents

Acknowledgements: ......................................................................................... 1

Abstract: ........................................................................................................... 2

Contents .......................................................................................................... 3

Chapter 1. Introduction .................................................................................... 4

1.1 Overview: ................................................................................................ 4

1.2 Outline .................................................................................................... 5

Chapter 2. Background and Literature Review ................................................ 6

2.1 Background ............................................................................................. 6

2.2 Literature Review .................................................................................... 7

Chapter 3. Coarse-to-fine PatchMatch methodology ....................................... 9

3.1 Basic matching ........................................................................................ 9

3.2 Coarse-to-fine scheme .......................................................................... 12

3.3 SIFT descriptor and match cost ............................................................ 14

3.4 Outlier handling and interpolation method ............................................ 14

3.5 Improving efficiency .............................................................................. 15

Chapter 4. Experiments ................................................................................. 16

4.1 Evidence to substantiate previous statements ...................................... 16

4.1.1 Performance variation between pixel-based and patch-based

matching .................................................................................................. 16

4.1.2 The influence of cell size on SIFT descriptors................................. 18

4.1.3 The effect of the outlier handling method ........................................ 18

4.2 Effects of parameters ................................................................................ 19

4.3 Benchmark testing on MPI-Sintel database ................................................. 22

Chapter 5. Future work .................................................................................. 26

Chapter 6. Conclusion ................................................................................... 27

References .................................................................................................... 28

Appendix ........................................................................................................ 30


Chapter 1. Introduction

1.1 Overview:

As an important component of many computer vision tasks, optical flow has been

studied extensively for a long period of time. There are abundant research studies and

pioneering works in this field. However, for some specific situations, such as large

displacements and occlusions, optical flow estimation remains a challenging task and

more research is required to determine how a reliable optical flow from consecutive

video frames can be obtained. A coarse-to-fine scheme is commonly used to extract

details but often fails to estimate large displacement due to a loss of details when the

motion is fast and the propagation of errors when outliers accumulate from the coarsest

to finest level.

Recently, Hu, Song and Li [7] proposed a new algorithm known as Coarse-to-fine

PatchMatch (CPM), which is designed for large displacement optical flow estimation.

Inspired by PatchMatch [2], a very efficient and effective image processing algorithm,

the CPM algorithm adopts an ingenious solution that uses propagation with random

search in a coarse-to-fine scheme. Since the optical flow is closely related to the nearest

neighbour field algorithm, through the propagation with random search strategy of

PatchMatch, the efficiency of computation can be remarkably enhanced. Moreover, in

CPM flow, the seed set is applied for further reduction of computation time. The idea

is to use a pixel in a specific position to represent a patch of surrounding pixels. The

correspondences derived from the seed set represent semi-dense optical flow. Then, the

interpolation method of EpicFlow [10] is used to interpolate the correspondences to

realise the dense optical flow. Aside from the PatchMatch algorithm, the coarse-to-fine

scheme and the EpicFlow interpolation, the CPM algorithm also uses SIFT descriptors

of SIFT Flow [4] to calculate the match cost and a forward-backward consistency check

[1] to remove the outliers.

In this project, the primary goal is to reconstruct the code for the CPM algorithm. As

the C++ source code of CPM has not yet been released, my supervisor asked me to

write the CPM code according to the algorithm’s description in [7]. During this period,

I read plenty of materials related to this algorithm, including but not limited to the

PatchMatch algorithm [2], EpicFlow [10], SIFTFlow [4] and forward-backward

consistency check [1]. Since I have limited knowledge of computer vision and optical

flow, I was required to first learn about basic concepts and background knowledge. For

the programming software, I chose MATLAB to realise the algorithm for two main


reasons. The first reason is that MATLAB offers many advantages in terms of data and

image processing, so I could easily use these features for image processing and get on

the right track as soon as possible. The second reason is that the original code is said to

be realised in C++ by Hu, Song and Li, though it has not yet been released. To simply

repeat their work is not a credible way to perform academic research and it would not

be appropriate to attempt to improve the work of another scholar. Thus, the most

suitable approach was to combine my own learning experience with the functions

offered by MATLAB software. Once I had completed the MATLAB code, I was

required to improve the efficiency of computation, which was achieved by using hybrid

programming and optimising the structure of the code. These measures enhanced the

efficiency of the code.

1.2 Outline

The structure of this report is arranged as follows: Chapter two presents an introduction

of literature relevant to the research topic in order to discuss the foundation theories

that make the CPM algorithm realisable. In Chapter 3, the framework of the CPM

algorithm is presented and the main features are discussed. In Chapter 4, the results of

experiments and parameter analysis will be presented and discussed. Chapter 5

discusses future work in this area and Chapter 6 provides a conclusion of the study.


Chapter 2. Background and Literature Review

2.1 Background

The aim of this project is to realise a state-of-the-art efficient dense optical flow

estimation from two images. The optical flow is defined as the distribution of velocities

of an object in an image. It can be represented by arrows or colour patches and provides

information about the spatial arrangement of images and how it changes. The optical

flow is generally applied in computer vision to characterise the motion of objects in

consecutive images or video frames. Through the implementation process, I developed

valuable research skills and improved my self-learning ability with regard to advanced

technologies in computer science. This project also required me to apply relevant

knowledge in practice, which was not only a significant challenge but also a necessary

step in becoming a more qualified university student. More specifically, during the

initial stage of the research process, I was required to understand general concepts and

acquire basic knowledge of the optical flow estimation. Following this, as there are

different methods of optical flow estimation, I needed to compare state-of-the-art

methods in order to determine the advantages and disadvantage of each (see Section

2.2). Then, implementing the random search strategy PatchMatch in coarse-to-fine

fashion was an essential part of this project. The algorithm was published in 2016 by

Hu et al. with the exception of the source code. While the original code was

implemented in C++, my supervisor and I decided to implement the algorithm in

MATLAB because this software has many useful features in image and data processing

that can facilitate improvements to the performance of the code. Moreover, we

originally decided to exploit the efficient implementation strategy on real time target

based on CPU, since the most existing high performance method is relied on the opencv

library of C/C++ based on GPU. However, as the project progressed, the MATLAB

software was hindered by limited computation efficiency. In other words, the algorithm

realised by MATLAB was not fast enough to implement in real time application, even

though the efficiency had been improved as much as possible. Therefore, we did not

succeed in enhancing the efficiency of the MATLAB code.


2.2 Literature Review

Since we are limited by space and time, only the most closely related literature will be

reviewed.

I firstly refer to the seminal paper written by Horn and Schunck [6] who present a classic

energy minimisation method for the optical flow estimation. This paper provides me

with insights into the general concept of optical flow, which the authors describe as the

distribution of velocities of an object in an image. The optical flow can be represented

by arrows or colour patches, which provide information about the spatial arrangement

in images and how it changes. The CPM-flow paper [7] offers a detailed description of

the CPM algorithm combining the coarse-to-fine scheme and the PatchMatch matching

method. Through this paper, I learn about state-of-the-art optical flow methods devised

in recent years. Based on the NNF technique, Bao et al. [1] apply the PatchMatch

algorithm to their edge-preserving method. The matching algorithm is effective, but

without further measures to improve the performance in large displacement situations,

it often fails when faced with significant occlusions. The descriptor matching method

is introduced by Brox and Malik [3], who offer an innovative approach to the matching

procedure in optical flow estimation. Many of these new methods are inspired by

descriptor matching techniques, such as SIFTFlow. Moreover, Xu et al. [8] propose a

coarse-to-fine scheme that is used to refine the flow on each level. However, as its

matching is sparse and its result depends on accurate initialisation, the performance of

the scheme is unsatisfactory when there are small details and large motion.

The following studies are all closely related to my project. The PatchMatch [2]

algorithm was proposed in 2009, which was initially used as an interactive image

editing tool. It uses the NNF technique for matching and adopts a propagation strategy

with random search to find the nearest-neighbour matches between two images. The

propagation step ensures that the pixel or the patch which has a high probability of

being the right match is checked. The random search step ensures the optimisation will

not fall into local optimisation. In my project, the propagation with random search

strategy is essential to achieving fast and optimised matching between two images.

Then, the SIFTFlow [4] provides powerful SIFT descriptors that extract features from

images and extract information from these features to a 128-dimensional descriptor for

each pixel. By using descriptors in the matching process, the complexity of the

descriptor ensures every patch that contains different features has sufficient differences,

thereby achieving a more accurate matching result. Furthermore, a fast edge-preserving

interpolation is realised by EpicFlow [10] as it can interpolate a sparse or semi-dense

optical flow to a dense optical flow without losing the edge information. The


“structured edge detector” (SED) [5] is used to generate the edge data used by EpicFlow.

The EpicFlow uses the interpolation method on the results of DeepMatching, but the

efficiency of DeepMatching limits the performance of EpicFlow.


Chapter 3. Coarse-to-fine PatchMatch

methodology

In this chapter, I will present the entire framework of the CPM including the basic

matching method, the coarse-to-fine framework, the descriptor matching method, the

EpicFlow interpolation method, outliers handling method and efficiency improvement.

Overall, the CPM is a hierarchical nearest neighbour field algorithm that blends a

random search strategy and a propagation approach to realise efficient optical flow

estimation in large displacement scenes.

3.1 Basic matching

In order to reduce the computation time and ensure satisfactory accuracy, in CPM, the

seeds in specific positions are chosen to represent surrounding pixels (Figure a). Thus,

the computation time can be reduced significantly because the matching step finds the

best correspondence between the seeds rather than every pixel. More specifically, the

specific position of the seed is set on the cross point of an image grid with spacing of d

pixels. In other words, in every d × d grid, there is only one seed to represent this grid.

The matching process can be formally described as follows: for two images 𝐼1, 𝐼2 and a

seed set S = {𝑠𝑚} which is positioned at{p(𝑠𝑚)}, the CPM is going to determine each

flow between two seeds: f(𝑠𝑚) = M(𝑝2(𝑠𝑚) − 𝑝1(𝑠𝑚)) , where M(p(𝑠𝑚)) is the

corresponding position of seed 𝑠𝑚 from 𝐼1 𝑡𝑜 𝐼2 [7]. The cost function is shown below:

f(𝑠𝑚) = arg 𝑚𝑖𝑛𝑓(𝑠𝑖)(𝑐(𝑓(𝑠𝑖))) , 𝑠𝑖 ∈ {𝑠𝑚} (1)


where c(f(·)) denotes the match cost (distance) between two seeds. The computation is

in Section 3.3.

In Section 4.1.1, the table shows that the computation time is remarkably decreased by

using seeds (patch) instead of every pixel and indicates that the accuracy has not been

compromised (Figure 1.1-1.4).

After the construction of the seed set, the neighbour system is completed. In

PatchMatch and CPM, the random search and propagation strategy is then performed

to search for correspondences. The illustration is shown in Figure c [2]. In CPM, it is

performed between seeds. The original propagation in PatchMatch and CPM separately

performs the propagation and random search in an interleaved manner. For example, if

we use P to represent propagation and R to represent random search, the original

matching step is as follows: P1, R1, P2, R2…Pn, Rn. A complete propagation process

is required before the random search step is undertaken. As the name suggests, the

propagation strategy means that the examined (or initialized on coarsest level) flow

values will be propagated from neighbour seeds to the current seed in the current

iteration and will also be propagated from the current seed to other seeds in later

iterations. This requires a whole loop to perform. In my opinion, this method is a waste

of time because the propagation step could be blended into the random search. Thus, in

the initial stage, I adjusted the propagation technique. As the random search is also

performed in a loop fashion, I blended the propagation into the random search. However,


this method decreased the accuracy while not significantly decreasing the running time.

Thus, I abandoned this approach and reverted back to the original propagation.

The propagation process propagates the flow value before the random search, by scan

order (forward) in odd iterations and by reverse scan order in even iterations. The logic

behind the propagation is that: in a neighbour system, to improve the current pixel f(x,y),

we use the known correspondences of f(x-1,y) and f(x,y-1), assuming f(x,y) is the same

as f(x-1,y) and/or f(x,y-1) [2]. For example, if we already have a good mapping at (x-

1,y), if (x,y) is consecutive or has the same texture as (x-1,y), we can say that the

correspondence at (x-1,y) has a high probability of being translated to (x,y). Thus, given

that D(v) denotes the distance of two patches at position (x,y), we take the value for

f(x,y) to be the arg min{D(f(x,y)),D(f(x-1,y)),D(f(x,y-1))} in odd iterations [2].

Moreover, in even iterations, we take the value of arg

min{D(f(x,y)),D(f(x+1,y)),D(f(x,y+1))}. The process of computing match cost (also

referred to as distance) is shown in Section 3.3.

The random search process is much simpler. After the current level is initialised (when

the current level is the coarsest level) or the correspondence passed from the last level

is accepted (when the current level is not the coarsest level), an approximate

correspondence for current seeds is calculated. Therefore, the random search process

identifies the optimum potential correspondences in a specific area. The area is formally

defined by equation 2:

𝑈𝑖 = 𝑣0 + 𝑤𝑎𝑖 ∗ 𝑅𝑖 (2)

where 𝑈𝑖 is the current seed waiting for random search, 𝑣0 is the initial correspondence,

𝑅𝑖 is a uniform random in [-1,1] x [-1,1], w is the maximum search radius, and α is a

scale ratio that is usually fixed to 1/2. The entire equation describes a gradually

decreasing search area for i=0, 1, 2... until the search radius 𝑤𝑎𝑖 is less than 1. For a

certain number of random searches, we always retain the correspondence that has the

lowest match cost (shown in Section 3.3).


3.2 Coarse-to-fine scheme

The basic matching section has introduced the propagation and random search strategy.

It is similar to PatchMatch in which it is so noisy that many outliers may be generated.

One way to solve this is to increase the patch size, which can be achieved by using the

seeds. However, this will result in loss of accuracy. Thus, the coarse-to-fine scheme is

introduced to provide global regularization and maintain a satisfactory level of accuracy.

It is a hierarchical structure and is usually referred to as an image pyramid, as shown in

Figure (a) [7]. In the pyramids, there are several levels from top to bottom. These levels

indicate the degree of precision: the higher, the coarser. Given a factor n, we could

construct a k levels pyramid. The scale factor is generally set to 1/2, which means the

resolution of the image is reduced to half every level up. For two images 𝐼1 and 𝐼2, the

lth level of pyramid of I𝑖 is denoted as I𝑖𝑙 , i ∈ {1,2}, l ∈ {0.1, … , k − 1} [7]. The top

level of the pyramid is the coarsest level and the bottom level is the finest level, which

usually contains the raw image. The aim is to find the correspondence of every seed in

I10 and I2

0, starting from I1𝑘−1 and I2

𝑘−1.

The seed should be set up on each level at the same position for both images. The

position of the representing pixel is recorded in each seed. For example, for an image

with 1024x512 pixels, to get a seed set that has a spacing of 4 pixels on each level, we

should construct a 256x128 seed set. The seeds are set up at the cross points, such (1,

1), (1, 5), (5, 1), (5, 5) …etc. Since the CPM does not adopt any sub-pixel accuracy and

the number of seeds is always same, hence, on some coarser levels, some seeds might

represent the same pixel. All positions inside the seeds are truncated to the nearest

integer. This means that the propagation with random search is extensively executed on

coarser levels, which could ensure the robustness of correspondences on higher levels

[7].


The propagation and random search are then performed on each level. The first step is

to initialise the flow value on the coarsest level because the process will start from the

coarsest level and pass the value to the next level for initialising. The initial flow value

on the top level could be a random value. Thus, performing a propagation and random

search on this level will generate a better flow value that is used to initialise the next

level according to equation (3) [7]:

{f(𝑠𝑙)} =1

n· {f(𝑠𝑙+1)}, l < k − 1 (3)

Where n is the factor usually set as 1/2, and 𝑠𝑙 is the seed set on level l, and {f(𝑠𝑙+1)}

is the obtained flow passed from the last level. The propagation between adjacent levels

keeps performing until the finest level generates a final result.

ALGORITHM [7]:


3.3 SIFT descriptor and match cost

In the dense SIFT flow [4], an effective descriptor is proposed to extract features for

every pixel in an image. The SIFT descriptor combines the features of surrounding

pixels to derive a 128-dimension descriptor for a single pixel. The process of generating

SIFT descriptors is very fast so that adopting this approach does not significantly

increase the running time but provides a powerful and feasible module that can be used

in the matching process to compute the distance between seeds. In this section, the code

for generating SIFT descriptors is cited from Ce Liu’s website, which is also mentioned

in [4]. Furthermore, the matching process is performed to compute the summation of

absolute difference between two 128-dimension descriptors as the match cost (distance).

The equation is shown in Equation 4, where 𝑑1 is the descriptor of the first image and

𝑑2 is the descriptor of the second image:

C = ∑|𝑑𝑖1 − 𝑑𝑖

2|

128

𝑖=1

(4)

According to [4], a single descriptor is generated based on the features of a patch. Thus,

I assume that the patch size of a SIFT descriptor and the seed spacing of CPM might

have a correlation so that applying the same patch size will achieve a better performance.

The experimental results confirm my assumption and further indicate that when the size

of the SIFT patch is less than the seed spacing of CPM, there is more outliers; on the

other hand, when the size of the SIFT patch is larger than the size of the CPM patch, as

the SIFT patch grows, the result becomes fuzzy. A SIFT patch size that is the same size

or slightly smaller will help improve the correspondences. The results are shown in

Section 4.1.2. The experiments also demonstrate that the distance between two seeds

rarely appears to be the same when using descriptors. Therefore, the correspondences

generated by this matching process are convincing.

3.4 Outlier handling and interpolation method

As in [1] and [7], I also adopt a forward-backward consistency check. This is useful in

detecting the occlusions and removing the outliers. In order to ensure the error will not

propagate and accumulate level by level, the forward-backward consistency check is

performed at every level. The forward-backward consistency check involves

performing the programme twice, once in a forward direction to find the

correspondences from I1 to I2 and once in a backward direction to find the

correspondences from I2 to I1. If the difference at the same position is larger than a


setting value, the correspondence at that position will be treated as an outlier and

removed. Furthermore, to improve the accuracy, a threshold is set to a specific value to

limit the upper bound of the match cost. If the match cost is larger than the threshold,

even if the match cost is the lowest value of all iterations, the correspondence will not

be recorded. The experiments show that these two outlier handling methods could

considerably improve the accuracy but double the running time on each level. For

example, if there are 5 levels, the running time will increase fivefold. This is the

inevitable trade-off between efficiency and accuracy. However, the noticeable

improvement in accuracy is worth the extra time.

After the outlier handling process has been performed, the primary flow vector can be

derived. However, the flow is not dense until interpolation in the next step. With regard

to the interpolation method, in [10], the EpicFlow provides an edge-preserving

interpolation method. By using the SED (structured edge detector) [5], the edge data of

the source frame is derived. Then, given the edge data, the EpicFlow performs

interpolation within each block divided by the edge, which can preserve the edge as

much as possible. Finally, the primary semi-dense flow becomes a dense flow. At this

point, the whole process of generating CPM flow including the interpolation step is

complete.

3.5 Improving efficiency

As MATLAB is good at image and data processing but has low computational

efficiency, the running time of my MATLAB code is quite long compared with the

original CPM. Thus, following the instructions in the build-in “Profile” function, the

core function that is used to compute the distance between two seeds is adapted to C

language and integrated into a MEX file. A MEX file can provide an interface between

MATLAB and the C function. Generally speaking, C is much faster than MATLAB.

The running time is shortened by more than one-third without any loss of accuracy. The

optimisation of the code structure is also useful in increasing efficiency, such as by

removing unnecessary functions and changing some complex data structures to vectors

that are easy for MATLAB to process. Using this step, the computation time is

decreased by a further one-fifth. Nevertheless, the running time is still five times higher

than that of the original CPM.


Chapter 4. Experiments

In this section, I will first provide some evidence to substantiate previous statements.

Secondly, I will change some parameters in the MATLAB code to explore their effects

and try to find a relatively good combination. Finally, I will use an online benchmark

dataset to evaluate the performance of my project, which is the most convincing way to

assess its validity and application. Unfortunately, as the implementation does not have

a fast enough performance, so the real time optical flow estimation is not realised.

4.1 Evidence to substantiate previous statements

4.1.1 Performance variation between pixel-based and patch-based

matching

In Section 3.1, the computation time of pixel-based optical flow estimation is said to be

much higher than that of patch-based optical flow estimation. The results are shown

here:

As two For Loops are used in my code for

traversal of an image, the time complexity

for this part is O(n^2). Thus, the increase in

the number of seed points will significantly

increase the computation time. The size

(space) of patch indicates the number of

seeds. For the pixel-based method, the space

of the seeds can be considered as 1. For the

patch-based method, the seed spacing is 5,

which means, theoretically, the pixel-based

optical flow will be 25 times (5^2) longer

than the patch-based. The result also shows a

multiple of around 25.

Although there is a 20-fold difference between the computation times, the result does

not show any significant variation. We can see from Figure 1.1- Figure 1.4, in the left

column, that the 1024*436 resolution results are not quite outstanding as the right

column results in the same row are 205*88 resolution. However, the number of level

does indeed affect the accuracy in a visible way. In the bottom two images, there are

fewer outliers. While the right bottom image is still noisy, it is acceptable and can be

interpolated into a smooth flow.

Methods Time (s)

One level

Pixel-based optical flow 188.28

Patch-based optical flow 11.24

Three levels

Pixel-based optical flow 579.42

Patch-based optical flow 27.59

Table 1: computation time between pixel-based optical flow estimation and patch-based optical flow


4.1.2 The influence of cell size on SIFT descriptors

Based on this comparison in Figure 2, we can see that if the size of the SIFT patch is

too small, such as 1, the flow will have many outliers, some of which will be removed,

such as the white colour in the third image. The white colour can be interpolated by

EpicFlow. So, a small patch of white colour does not matter. The flow with a SIFT

patch and seed patch of equal size will generate a better and more satisfactory result. If

the size of the SIFT patch is much larger than the seed patch, the edge of the flow will

be too fuzzy to retain its shape. Therefore, the size of the SIFT patch should be equal

to or slightly smaller than the seed spacing in practice.

4.1.3 The effect of the outlier handling method

In Figure 3, the bottom two images demonstrate the effect of the forward-backward

consistency check. These flows are derived from large displacement images where

some elements in the first image will disappear in the next image (not shown here),

which is referred to as occlusion. Without a consistency check, occlusions will result in

an incorrect flow. For example, in the top right corner of the blade in the last image, the

red colour is the ghosting of the blade in the next image, which is not expected to appear.

Upon applying a consistency check, the ghosting is eliminated in the bottom left image.

Furthermore, the resolution on the edge is also improved, such as the shape of the girl

and the blade.


4.2 Effects of parameters

There are eight parameters that can be edited. They respectively are the number of the

pyramid level, the number of iterations, the space of a seed, the cell size of a SIFT patch,

the maximum searching radius, the number of random searches for each seed, the

threshold for matching cost and the range of forward-backward consistency checks.

The default parameters are [5, 4, 5, 5, 8, 16, 2000, 100], which are already tested on

other images and have a balance between time consumption and accuracy. They can be

divided into 3 types according to the factor that they affect: time and accuracy, time

only and accuracy only. The accuracy is represented by average end-point error (AEE)

that is the average difference for each pixel. The experiments will be conducted by

using the control variable method that changes a single parameter and observes the

effect. The test objects are a set of large displacement images for which my code does

not achieve an adequate performance. In other words, these sets of images are sensitive

to the modification of parameters. For each parameter, the experiment will be

conducted 3 times and the average will be taken as the final result.

Note that all experiments are run on Intel Core i7 4.0GHz CPU.

For the first type that affects both the time and accuracy, it includes the number of levels,

the space of patches and the number of random searches. The result is shown in Figure

4.1- Figure 4.4.

In this set of figures, the left Y-axis is the time consumption and it corresponds to the

blue line (the line with block on it). The orange line with the crosses is the average end

point error, corresponding to the right Y-axis. Figure 4.1 demonstrates that the number

of levels linearly affects the time consumption but the average end point error does not

linearly decrease. Once the number of levels exceeds 3, there is little improvement,

primarily because the number of levels will affect the resolution in low levels. As the

number becomes larger, the resolution exponentially decreases and becomes too coarse

to distinguish, which barely contributes to the matching process. Thus, the 3 levels

might be a wise choice to balance both time and accuracy.

In Figure 4.2, note that the time consumption exponentially decreases as the seed

spacing increases. The reason for this was explained in section 4.1.1.

In Figure 4.3, note that the number of random searches will not significantly affect the

AEE but the time increases as the number of random searches increases. The reason for

this is obvious as the number of random searches directly affects the number of

computations. Thus, for general images, the 8 or 16 is enough for accurate matching.


The number of iterations is also a factor that can affect the performance. One iteration

means the one-way propagation by scan order or reverse scan order. Theoretically, the

accuracy is positively correlated to the number of iterations. However, the time

consumption linearly increases at the same time. Thus, a number around the cross point,

such as 4 or 6, would be an ideal choice.


The remaining parameters are shown in Figure 5.1 – Figure 5.4. There are some

parameters that primarily affect the AEE and barely affect the time, such as the scale of

consistency check (Figure 5.2), the threshold (Figure 5.3) and the cell size of the SIFT

descriptor (Figure 5.4). This is because those parameters do not increase the time

complexity but affect the result by restricting the threshold or consistency checking

scale. As for the cell size of the SIFT descriptor, it only changes the precision of the

element used in computing the match cost and its dimension does not change. Thus, the

time consumption is not affected.

The accuracy variation on a maximum search radius (Figure 5.1) shows a different trend.

More specifically, the AEE decreases but increases at a specific point. This might be

attributable to the fact that a large search field will introduce more noise than a small

radius. However, an appropriate search radius will help to avoid falling into a local

optimum. Thus, the maximum search radius chosen should depend on the extent of

motion. For instance, a large displacement motion should have a relatively large search

radius, and vice versa. If adopting a large search radius introduces more outliers than

expected, we need to consider modifying the outlier handling method, though this

situation does not happen in experiments. The time also changes because the radius is

halved in each iteration. In effect, if log2(𝑟𝑎𝑑𝑖𝑢𝑠) is larger than or equal to the number

of iterations, the time will increase.

Following the analyses of all these parameters, a relatively good combination is derived,

which consists of 3 levels, 5 (or 6) seed spacing, 16 random searches in each iteration,

4 (or 6) iterations, 8 (or 16 in large motion) search radius, a consistency check scale of

100, 2000 as the threshold and SIFT descriptors that are 3 cells in size. These

parameters will be used as an online benchmark in the next section.


4.3 Benchmark testing on MPI-Sintel database

By using the parameter combination derived from the previous section, my work is

tested on the MPI-Sintel database. This is a challenging benchmark system based on an

open source animated film. The test set has 1128 images divided into a clean version

that removes motion blur and visual effects, and a final version that is extracted from

the real movie with complete motion blur and visual effects. In each version, there are

twelve image sequences in different scenes. The MPI-Sintel dataset provides the visual

results and ground truth so as to compare the results of different methods. In this section,

my work is temporarily referred to as custom-cpm.

In the MPI-Sintel dataset, custom-cpm ranks 62 on both final and clean sets, as

measured by overall average end point error. The CPM-flow ranks 8 and EpicFlow

ranks 15. The final checks of ranks are performed by 22 Oct 2016. Table 2 shows the

performances of custom-cpm, CPM-flow and EpicFlow.

Methods Time AEE All AEE Matched AEE unmatched

Clean Set

Custom-CPM 22s 8.444 3.471 48.890

CPM-flow 4.3s 3.557 1.189 22.889

EpicFlow 16.4s 4.115 1.360 26.595

Final Set

Custom-CPM 22s 9.467 4.514 49.781

CPM-flow 4.3s 5.960 2.990 30.177

EpicFlow 16.4s 6.285 3.060 32.564

Table 2: Test result on MPI-Sintel dataset. AEE matched is the AEE on non-occluded areas. AEE unmatched is the AEE on occluded areas. Time consumption obtains from [7] and [10].

These statistics convey two things. The first thing is that the time consumption of

custom-cpm is much higher than that of CPM-flow. Thus, it is impossible to realise the

real time optical flow estimation using this method. The reason for this might be

MATLAB’s limited computing capacity. In the middle of this semester, I adapted a

crucial function that computes the match cost from MATLAB code to C and is

compiled by MEX in MATLAB. The time consumption decreases by almost 50% on

only one function. The reason why the rewriting of the code has such a significant effect

is that the C code accessed the underlying hardware, thus it takes advantage of this


feature and improves the performance. Furthermore, the match cost computing function

can run more than 10 million times in total propagation and random search steps, a

minor improvement that will be magnified to contribute to obvious changes to

performance.

The second thing is that the total AEE of custom-cpm is more than twice that of CPM-

flow in the clean set and less than twice that of CPM-flow in the final set. Meanwhile,

compared with EpicFlow, custom-cpm is still twice as much as EpicFlow in the clean

set and 50 percent higher than EpicFlow in the final set. There is also evidence to

suggest gaps between custom-cpm and other works from the visualisation result on the

benchmark test.

In Figure 6, we can see from the first column that custom-cpm performs better on the

stick image. The CPM-flow does not find an appropriate match for the stick or treats

the stick as an outlier and proceeds to remove it. On the contrary, EpicFlow overfits on

the stick so that many outliers are maintained. Compared to CPM-flow, custom-flow

achieves poor accuracy at the edge of the image, such as at the bottom left corner and

the top right corner. It also has a pale area at the left side of the stick. As for the second

column, custom-cpm shows obvious defects on the left edge of the image and on the

right side of the girl’s body. These parts display an obviously different texture and an

unclear boundary to their neighbourhoods. However, custom-cpm still performs

commendably at the point between the two feet. In addition, it has a clear boundary

rather than an obscured boundary as in the CPM-flow and EpicFlow display. For the

third column, the custom-cpm shows remarkable strength in preserving details. For


instance, we can see that there are a few birds in the top centre of the image. The

custom-cpm was able to preserve more than half of these birds while CPM-flow and

EpicFlow preserve none. Nevertheless, custom-cpm still fails to match the largest bird

on the top right.

In order to determine why custom-cpm performs poorly in the aforementioned areas, I

check the original image and compare the semi-dense flows of custom-cpm and CPM-

flow which are not interpolated by EpicFlow. The comparison is shown in Figure 7.

In Figure 7, due to the different representation methods, the white in the first two rows

means the same as the black of the third row in that they both indicate the removal of

outliers. The first row has a small consistency check factor, which removes a large

percentage of outliers but subsequently fails to identify many right correspondences,

for example, the left side of the body in the top left image and the right boundary of the

top right image. However, if we increase the factor to a larger number, many outliers

as well as right correspondences are preserved. Thus, after interpolation, custom-cpm

performs poorly on the left boundary of the middle left image because the outliers

impact the accuracy of the interpolation process. This dilemma may be attributable to

the lack of a reliable occlusion handling method. In this case, as the occlusion handling

process of custom-cpm simply compares the flow values between forward results and

backward results, it may not be enough.


Furthermore, there is another issue with the interpolation process. As mentioned

previously, the white pixels in the first two columns and the black pixels in the third

column mean the same in that they both indicate the removal of outliers and empty flow

values. Despite using the same method (EpicFlow) to interpolate, generated outputs are

different in the empty space (shown in Figure 8). The custom-cpm has a large empty

area in the top-middle position, which is the same as the CPM-flow. However, the

interpolation in CPM-flow generates a much better result than the interpolation in

custom-cpm. On the left edge of CPM-flow, there is an empty area caused by occlusions.

On the contrary, this area is perfectly interpolated by EpicFlow. Meanwhile, on the

right edge of custom-cpm, the situation is almost the same in that there is an empty area

that needs to be interpolated. However, EpicFlow only blurs the border of the white and

cyan, which is difficult to explain as EpicFlow does not provide any API to change the

parameters in interpolation. Thus, I have no choice but to compromise by increasing

the factor of the consistency check to maintain as many right correspondences as

possible. Once I identified this problem in occlusions and large displacement motion

scenes, there was not much time left for me to find another interpolation method or

incorporate more reliable occlusion handling methods.

Overall, custom-cpm has a slow computation time but has a better ability to preserve

details in video frames. However, due to the lack of a reliable occlusion handling

method, the accuracy is relatively poor for the occlusions and large displacement

motion scenes. This disadvantage could have perhaps been overcome if there was more

time to try other methods and improve the custom-cpm itself.


Chapter 5. Future work

As my project did not achieve a satisfactory performance in the optical flow estimation

of large displacement motion, there is significant work that can be done in the future to

extend and expand upon the project.

Compared to CPM-flow, my project still has three obvious defects on which the future

works can focus. The first defect relates to time consumption. So far, the time

consumption is five times that of CPM-flow. There are also at least two areas that can

be improved, namely data structure and programming language. As the matching

process is performed around 10 million times in total, a highly efficient data structure

would no doubt make the searching and matching process more effective. Furthermore,

as mentioned in the previous section, adapting the MATLAB code to C/C++ could

significantly decrease time consumption. The second defect is the outlier handling

method which causes difficulty in distinguishing the right correspondences and the

outliers. I think a more reliable occlusion handling method could help overcome this

issue because most wrong correspondences tend to appear in the occlusion area. The

third defect involves poor interpolation in the empty area, which may be attributable to

the empty area being unclear. Alternatively, the EpicFlow interpolation method may

need to be modified. In addition, we could also find other descriptors that include more

information and use them in the matching process to improve the accuracy of the

matches generated. At least for these three defects, the related works can be carried out.


Chapter 6. Conclusion

In conclusion, my project has been inspired by [7] and implements a coarse-to-fine

PatchMatch optical flow estimation algorithm in MATLAB. I conducted a series of

experiments to optimise the implementation as well as analyse the effect of parameters.

This project combines coarse-to-fine hierarchical architecture and PatchMatch with

propagation and random search to decrease the noise and introduce global

regularization. It also uses SIFT descriptors to characterise the seed pixel in order to

obtain an accurate correspondence. Finally, EpicFlow interpolates the semi-dense

optical flow to dense optical flow. Unfortunately, due to the limited efficiency of this

method, the real time optical flow estimation cannot be realised.

Compared to the original CPM-flow, my implementation can estimate large

displacement motion to some extent, but performs poorly when the large displacement

is accompanied by occlusions. Nevertheless, it has its own advantage in that it can

preserve many details and tiny structures, which is a significant benefit considering

both CPM-flow and EpicFlow perform poorly in terms of detail preservation. Moreover,

although the time consumption is higher than that of CPM-flow, it is still faster than

many other dense optical flow estimation methods.

In the future, studies should aim to improve the computational efficiency and

incorporate an occlusion handling method so as to improve the accuracy on the

boundaries and occlusion areas.


References

[1]L. Bao, Q. Yang and H. Jin, "Fast Edge-Preserving PatchMatch for Large

Displacement Optical Flow", IEEE Transactions on Image Processing, vol. 23, no.

12, pp. 4996-5006, 2014.

[2]C. Barnes, E. Shechtman, A. Finkelstein and D. Goldman, "PatchMatch: A

randomized correspondence algorithm for structural image editing", TOG, vol. 28,

no. 3, p. 1, 2009.

[3]T. Brox and J. Malik, "Large Displacement Optical Flow: Descriptor Matching in

Variational Motion Estimation", IEEE Transactions on Pattern Analysis and

Machine Intelligence, vol. 33, no. 3, pp. 500-513, 2011.

[4] Ce Liu, J. Yuen and A. Torralba, "SIFT Flow: Dense Correspondence across

Scenes and Its Applications", IEEE Transactions on Pattern Analysis and Machine

Intelligence, vol. 33, no. 5, pp. 978-994, 2011.

[5]P. Dollar and C. Zitnick, "Fast Edge Detection Using Structured Forests", IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 8, pp.

1558-1570, 2015.

[6]B. Horn and B. Schunck, "Determining optical flow", Artificial Intelligence, vol.

17, no. 1-3, pp. 185-203, 1981.

[7]Y. Hu, R. Song and Y. Li, "Efficient Coarse-to-Fine PatchMatch for Large

Displacement Optical Flow", CVPR 2016, pp. 5704-5712, 2016.

[8] Li Xu, Jiaya Jia and Y. Matsushita, "Motion Detail Preserving Optical Flow

Estimation", IEEE Transactions on Pattern Analysis and Machine Intelligence,

vol. 34, no. 9, pp. 1744-1757, 2012.

[9]Y. Li, D. Min, M. Brown, M. Do and J. Lu, "SPM-BP: Sped-up PatchMatch

Belief Propagation for Continuous MRFs", The IEEE International Conference on

Computer Vision (ICCV), pp. 4006-4014, 2015.


[10]J. Revaud, P. Weinzaepfel, Z. Harchaoui and C. Schmid, "EpicFlow: Edge-

preserving interpolation of correspondences for optical flow", Computer Vision

and Pattern Recognition (CVPR), 2015.


Appendix 1


Appendix 2


Appendix 3

Here is the file list of my software. My own works include

“collection_of_data_processing.m”, “demo.m”, “init_match.m”, “main.m”,

“pyramid.m”, “random_search.m”, “custom_match.mexw64” and “seet_set.m”.

As for “computeColor.m”, “flowToColor.m”, “readflo.m” and “readFlowFile.m”, they

refer to Desun Qing’s work.

As for “desc_generate.m”, “mexDenseSIFT.mexw64” and

“mexDiscreteFlow.mexw64”, they refer to the online code of SIFT Flow. “C. Liu, J.

Yuen and A. Torralba. SIFT Flow: Dense Correspondence across Scenes and its

Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence

(TPAMI), 2010.”

Other files are prepared for demo and testing.

In the bottom of “demo.m” file, there is a piece of code prepared for a general testing

of the correctness. When testing the correctness, a ground truth “*.flo” file is required.


I have provided 3 “.flo” file for testing that are “frame_0005.flo”, “frame_0017.flo”

and “frame_0040.flo”. They are corresponding to three groups of images. For example,

when we are estimating a group of pictures named “frame_0005.png” and

“frame_0006.png”, we can remove the comment symbol on the bottom of “demo.m”

code and change the path of ground truth to “frame_0005.flo”. Then, executing the

demo file, an average end point error will display in MATLAB.

The more accurate test requires to use EpicFlow to generate dense optical flow and use

the “compute AEE” code in “collection of data processing.m”.

As I said above, to conduct an experiment, we can modify the path of “image1” and

“image2” in “demo.m” file. The struct “model” is also modifiable. If we want to obtain

a rough end point error, we can use the bottom code in “demo.m” file. If we need a

more accurate comparison, we need to use the “compute AEE” module in “collection

of data processing.m”.

All of my code is written and tested on Windows 10 64 bit, MATLAB r2015b. The mex

module in MATLAB is required.


Appendix 4

README:

This is the software package of my individual project of COMP4560 in Australian

National University:

Qiang Duan, Efficient Dense Optical Flow Estimation from Two-Images, 27th Oct

2016

Email: [email protected]

Usages:

All the MATLAB files are tested under Windows 10 64bit, based on Intel I7-6700k

CPU.

For default parameters, it costs 22s to computes optical flow for a 436*1024 resolution

image.

This software requires MEX module to compile and invoke some functions.

Run 'demo.m' first. The parameters in struct "model" could be modified.

After the result is derived, the successive work is to use

'collection_of_data_processing.m' to generate .txt file of flow value and edge file by

running SED_edge toolbox (does not provide here).

Combining the .txt file and edge file, the EpicFlow can generate the final dense optical

flow.

'main.m' is a function involves the main part of this software.

'random search.m' is a function includes the process of propagation and random search.

'desc_generate.m' generates the descriptors.

'init_match.m' is a function generates the initial original seed position.

'pyramid.m' is a function generates the image pyramid.

'seed_set.m' is a function generates the seed set which contains the pixel's position.

'custom_match.mexw64' is a mex function computes the match cost between two

descriptor.

Reference:

@inproceedings{Butler:ECCV:2012,

title = {A naturalistic open source movie for optical flow evaluation},


author = {Butler, D. J. and Wulff, J. and Stanley, G. B. and Black, M. J.},

booktitle = {European Conf. on Computer Vision (ECCV)},

editor = {{A. Fitzgibbon et al. (Eds.)}},

publisher = {Springer-Verlag},

series = {Part IV, LNCS 7577},

month = {oct},

pages = {611--625},

year = {2012}

}

@inproceedings{revaud:hal-01142656,

TITLE = {{EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical

Flow}},

AUTHOR = {Revaud, Jerome and Weinzaepfel, Philippe and Harchaoui, Zaid and

Schmid, Cordelia},

BOOKTITLE = {{Computer Vision and Pattern Recognition}},

YEAR = {2015},

}

C. Liu, J. Yuen and A. Torralba. SIFT Flow: Dense Correspondence across Scenes and

its Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence

(TPAMI), 2010.

efficient dense optical flow estimation from...

Documents