efficient dense optical flow estimation from...
TRANSCRIPT
Efficient Dense Optical Flow
Estimation from Two-Images
Qiang Duan
u5541596
Supervised by:
Dr. Yuchao Dai
A thesis submitted in partial fulfilment of the degree of
Bachelor of Advanced Computing (Honours) at
The Research School of Computer Science
Australian National University
October 2016
The Australian National University | 1
Acknowledgements:
I would like to convey my highest respect and deepest gratitude to everyone who helped
me to complete this individual project and report.
Above all, I would like to thank my supervisor, Dr. Yuchao Dai, for his clear
instructions, rigorous manner and patient guidance in both my academic research and
report writing. In the very beginning, he gave me a detailed introduction about this
project and helped me to analyse how to the algorithm should be implemented. As the
project proceeded, he kept encouraging me to improve the quality of the
implementation and conduct further research. This positive and rigorous academic
attitude will certainly help me perform more thorough academic research in the future.
I would also like to give a special thanks to Dr. Dingfu Zhou who gave me a tutorial in
a necessary module of my project, which helped me to generate the visual and dense
results.
Last but not least, I would like to thank the Course convener, Weifa Liang, who
provided us with the tutorial room and taught us about presentations and report writing.
In addition, class feedback on the mid-term presentations was helpful in improving our
presentation skills.
Declaration:
Except where otherwise indicated, this project is entirely my own work.
Qiang Duan
27th October 2016
The Australian National University | 2
Abstract:
Optical flow estimation is a key component of many computer vision tasks. Aiming to
build per-pixel correspondences between two images, the dense optical flow estimation
has broad applications in this field. This paper introduces the MATLAB
implementation of a dense optical flow estimation by using a coarse-to-fine fashion and
PatchMatch matching method, which is referred to as CPM (Coarse-to-fine
PatchMatch). This method is inspired by the nearest neighbour field method and also
incorporates the efficient random search strategy described in the PatchMatch
algorithm with coarse-to-fine fashion for optical flow. In addition, by using the hybrid
programming techniques of C and MATLAB, the efficiency of this method is enhanced
as much as possible. The implementation of this method verifies and validates the
efficiency and accuracy of its performance. Unlike other techniques that are noisy and
lack accuracy in terms of large displacement optical flow, such as NNF, the CPM
method uses a propagation step with a random search strategy on each level to decrease
noise. In addition, the coarse-to-fine scheme allows the tiny structures to be extracted
level by level in order to improve the performance in large displacement situations.
Furthermore, this method uses a seed set at specific positions in frames to represent a
set of adjacent pixels to achieve high efficient optical flow estimation and then uses an
edge-preserving interpolation method (EpicFlow) to realise dense optical flow
estimation. The results are then compared with original CPM-flow and other related
methods. As existing knowledge is not enough to accelerate the MATLAB code and
overcome the inherent limitations of MATLAB, my implementation is slower than the
original but generates acceptably accurate results.
The Australian National University | 3
Contents
Acknowledgements: ......................................................................................... 1
Abstract: ........................................................................................................... 2
Contents .......................................................................................................... 3
Chapter 1. Introduction .................................................................................... 4
1.1 Overview: ................................................................................................ 4
1.2 Outline .................................................................................................... 5
Chapter 2. Background and Literature Review ................................................ 6
2.1 Background ............................................................................................. 6
2.2 Literature Review .................................................................................... 7
Chapter 3. Coarse-to-fine PatchMatch methodology ....................................... 9
3.1 Basic matching ........................................................................................ 9
3.2 Coarse-to-fine scheme .......................................................................... 12
3.3 SIFT descriptor and match cost ............................................................ 14
3.4 Outlier handling and interpolation method ............................................ 14
3.5 Improving efficiency .............................................................................. 15
Chapter 4. Experiments ................................................................................. 16
4.1 Evidence to substantiate previous statements ...................................... 16
4.1.1 Performance variation between pixel-based and patch-based
matching .................................................................................................. 16
4.1.2 The influence of cell size on SIFT descriptors................................. 18
4.1.3 The effect of the outlier handling method ........................................ 18
4.2 Effects of parameters ................................................................................ 19
4.3 Benchmark testing on MPI-Sintel database ................................................. 22
Chapter 5. Future work .................................................................................. 26
Chapter 6. Conclusion ................................................................................... 27
References .................................................................................................... 28
Appendix ........................................................................................................ 30
The Australian National University | 4
Chapter 1. Introduction
1.1 Overview:
As an important component of many computer vision tasks, optical flow has been
studied extensively for a long period of time. There are abundant research studies and
pioneering works in this field. However, for some specific situations, such as large
displacements and occlusions, optical flow estimation remains a challenging task and
more research is required to determine how a reliable optical flow from consecutive
video frames can be obtained. A coarse-to-fine scheme is commonly used to extract
details but often fails to estimate large displacement due to a loss of details when the
motion is fast and the propagation of errors when outliers accumulate from the coarsest
to finest level.
Recently, Hu, Song and Li [7] proposed a new algorithm known as Coarse-to-fine
PatchMatch (CPM), which is designed for large displacement optical flow estimation.
Inspired by PatchMatch [2], a very efficient and effective image processing algorithm,
the CPM algorithm adopts an ingenious solution that uses propagation with random
search in a coarse-to-fine scheme. Since the optical flow is closely related to the nearest
neighbour field algorithm, through the propagation with random search strategy of
PatchMatch, the efficiency of computation can be remarkably enhanced. Moreover, in
CPM flow, the seed set is applied for further reduction of computation time. The idea
is to use a pixel in a specific position to represent a patch of surrounding pixels. The
correspondences derived from the seed set represent semi-dense optical flow. Then, the
interpolation method of EpicFlow [10] is used to interpolate the correspondences to
realise the dense optical flow. Aside from the PatchMatch algorithm, the coarse-to-fine
scheme and the EpicFlow interpolation, the CPM algorithm also uses SIFT descriptors
of SIFT Flow [4] to calculate the match cost and a forward-backward consistency check
[1] to remove the outliers.
In this project, the primary goal is to reconstruct the code for the CPM algorithm. As
the C++ source code of CPM has not yet been released, my supervisor asked me to
write the CPM code according to the algorithm’s description in [7]. During this period,
I read plenty of materials related to this algorithm, including but not limited to the
PatchMatch algorithm [2], EpicFlow [10], SIFTFlow [4] and forward-backward
consistency check [1]. Since I have limited knowledge of computer vision and optical
flow, I was required to first learn about basic concepts and background knowledge. For
the programming software, I chose MATLAB to realise the algorithm for two main
The Australian National University | 5
reasons. The first reason is that MATLAB offers many advantages in terms of data and
image processing, so I could easily use these features for image processing and get on
the right track as soon as possible. The second reason is that the original code is said to
be realised in C++ by Hu, Song and Li, though it has not yet been released. To simply
repeat their work is not a credible way to perform academic research and it would not
be appropriate to attempt to improve the work of another scholar. Thus, the most
suitable approach was to combine my own learning experience with the functions
offered by MATLAB software. Once I had completed the MATLAB code, I was
required to improve the efficiency of computation, which was achieved by using hybrid
programming and optimising the structure of the code. These measures enhanced the
efficiency of the code.
1.2 Outline
The structure of this report is arranged as follows: Chapter two presents an introduction
of literature relevant to the research topic in order to discuss the foundation theories
that make the CPM algorithm realisable. In Chapter 3, the framework of the CPM
algorithm is presented and the main features are discussed. In Chapter 4, the results of
experiments and parameter analysis will be presented and discussed. Chapter 5
discusses future work in this area and Chapter 6 provides a conclusion of the study.
The Australian National University | 6
Chapter 2. Background and Literature Review
2.1 Background
The aim of this project is to realise a state-of-the-art efficient dense optical flow
estimation from two images. The optical flow is defined as the distribution of velocities
of an object in an image. It can be represented by arrows or colour patches and provides
information about the spatial arrangement of images and how it changes. The optical
flow is generally applied in computer vision to characterise the motion of objects in
consecutive images or video frames. Through the implementation process, I developed
valuable research skills and improved my self-learning ability with regard to advanced
technologies in computer science. This project also required me to apply relevant
knowledge in practice, which was not only a significant challenge but also a necessary
step in becoming a more qualified university student. More specifically, during the
initial stage of the research process, I was required to understand general concepts and
acquire basic knowledge of the optical flow estimation. Following this, as there are
different methods of optical flow estimation, I needed to compare state-of-the-art
methods in order to determine the advantages and disadvantage of each (see Section
2.2). Then, implementing the random search strategy PatchMatch in coarse-to-fine
fashion was an essential part of this project. The algorithm was published in 2016 by
Hu et al. with the exception of the source code. While the original code was
implemented in C++, my supervisor and I decided to implement the algorithm in
MATLAB because this software has many useful features in image and data processing
that can facilitate improvements to the performance of the code. Moreover, we
originally decided to exploit the efficient implementation strategy on real time target
based on CPU, since the most existing high performance method is relied on the opencv
library of C/C++ based on GPU. However, as the project progressed, the MATLAB
software was hindered by limited computation efficiency. In other words, the algorithm
realised by MATLAB was not fast enough to implement in real time application, even
though the efficiency had been improved as much as possible. Therefore, we did not
succeed in enhancing the efficiency of the MATLAB code.
The Australian National University | 7
2.2 Literature Review
Since we are limited by space and time, only the most closely related literature will be
reviewed.
I firstly refer to the seminal paper written by Horn and Schunck [6] who present a classic
energy minimisation method for the optical flow estimation. This paper provides me
with insights into the general concept of optical flow, which the authors describe as the
distribution of velocities of an object in an image. The optical flow can be represented
by arrows or colour patches, which provide information about the spatial arrangement
in images and how it changes. The CPM-flow paper [7] offers a detailed description of
the CPM algorithm combining the coarse-to-fine scheme and the PatchMatch matching
method. Through this paper, I learn about state-of-the-art optical flow methods devised
in recent years. Based on the NNF technique, Bao et al. [1] apply the PatchMatch
algorithm to their edge-preserving method. The matching algorithm is effective, but
without further measures to improve the performance in large displacement situations,
it often fails when faced with significant occlusions. The descriptor matching method
is introduced by Brox and Malik [3], who offer an innovative approach to the matching
procedure in optical flow estimation. Many of these new methods are inspired by
descriptor matching techniques, such as SIFTFlow. Moreover, Xu et al. [8] propose a
coarse-to-fine scheme that is used to refine the flow on each level. However, as its
matching is sparse and its result depends on accurate initialisation, the performance of
the scheme is unsatisfactory when there are small details and large motion.
The following studies are all closely related to my project. The PatchMatch [2]
algorithm was proposed in 2009, which was initially used as an interactive image
editing tool. It uses the NNF technique for matching and adopts a propagation strategy
with random search to find the nearest-neighbour matches between two images. The
propagation step ensures that the pixel or the patch which has a high probability of
being the right match is checked. The random search step ensures the optimisation will
not fall into local optimisation. In my project, the propagation with random search
strategy is essential to achieving fast and optimised matching between two images.
Then, the SIFTFlow [4] provides powerful SIFT descriptors that extract features from
images and extract information from these features to a 128-dimensional descriptor for
each pixel. By using descriptors in the matching process, the complexity of the
descriptor ensures every patch that contains different features has sufficient differences,
thereby achieving a more accurate matching result. Furthermore, a fast edge-preserving
interpolation is realised by EpicFlow [10] as it can interpolate a sparse or semi-dense
optical flow to a dense optical flow without losing the edge information. The
The Australian National University | 8
“structured edge detector” (SED) [5] is used to generate the edge data used by EpicFlow.
The EpicFlow uses the interpolation method on the results of DeepMatching, but the
efficiency of DeepMatching limits the performance of EpicFlow.
The Australian National University | 9
Chapter 3. Coarse-to-fine PatchMatch
methodology
In this chapter, I will present the entire framework of the CPM including the basic
matching method, the coarse-to-fine framework, the descriptor matching method, the
EpicFlow interpolation method, outliers handling method and efficiency improvement.
Overall, the CPM is a hierarchical nearest neighbour field algorithm that blends a
random search strategy and a propagation approach to realise efficient optical flow
estimation in large displacement scenes.
3.1 Basic matching
In order to reduce the computation time and ensure satisfactory accuracy, in CPM, the
seeds in specific positions are chosen to represent surrounding pixels (Figure a). Thus,
the computation time can be reduced significantly because the matching step finds the
best correspondence between the seeds rather than every pixel. More specifically, the
specific position of the seed is set on the cross point of an image grid with spacing of d
pixels. In other words, in every d × d grid, there is only one seed to represent this grid.
The matching process can be formally described as follows: for two images 𝐼1, 𝐼2 and a
seed set S = {𝑠𝑚} which is positioned at{p(𝑠𝑚)}, the CPM is going to determine each
flow between two seeds: f(𝑠𝑚) = M(𝑝2(𝑠𝑚) − 𝑝1(𝑠𝑚)) , where M(p(𝑠𝑚)) is the
corresponding position of seed 𝑠𝑚 from 𝐼1 𝑡𝑜 𝐼2 [7]. The cost function is shown below:
f(𝑠𝑚) = arg 𝑚𝑖𝑛𝑓(𝑠𝑖)(𝑐(𝑓(𝑠𝑖))) , 𝑠𝑖 ∈ {𝑠𝑚} (1)
The Australian National University | 10
where c(f(·)) denotes the match cost (distance) between two seeds. The computation is
in Section 3.3.
In Section 4.1.1, the table shows that the computation time is remarkably decreased by
using seeds (patch) instead of every pixel and indicates that the accuracy has not been
compromised (Figure 1.1-1.4).
After the construction of the seed set, the neighbour system is completed. In
PatchMatch and CPM, the random search and propagation strategy is then performed
to search for correspondences. The illustration is shown in Figure c [2]. In CPM, it is
performed between seeds. The original propagation in PatchMatch and CPM separately
performs the propagation and random search in an interleaved manner. For example, if
we use P to represent propagation and R to represent random search, the original
matching step is as follows: P1, R1, P2, R2…Pn, Rn. A complete propagation process
is required before the random search step is undertaken. As the name suggests, the
propagation strategy means that the examined (or initialized on coarsest level) flow
values will be propagated from neighbour seeds to the current seed in the current
iteration and will also be propagated from the current seed to other seeds in later
iterations. This requires a whole loop to perform. In my opinion, this method is a waste
of time because the propagation step could be blended into the random search. Thus, in
the initial stage, I adjusted the propagation technique. As the random search is also
performed in a loop fashion, I blended the propagation into the random search. However,
The Australian National University | 11
this method decreased the accuracy while not significantly decreasing the running time.
Thus, I abandoned this approach and reverted back to the original propagation.
The propagation process propagates the flow value before the random search, by scan
order (forward) in odd iterations and by reverse scan order in even iterations. The logic
behind the propagation is that: in a neighbour system, to improve the current pixel f(x,y),
we use the known correspondences of f(x-1,y) and f(x,y-1), assuming f(x,y) is the same
as f(x-1,y) and/or f(x,y-1) [2]. For example, if we already have a good mapping at (x-
1,y), if (x,y) is consecutive or has the same texture as (x-1,y), we can say that the
correspondence at (x-1,y) has a high probability of being translated to (x,y). Thus, given
that D(v) denotes the distance of two patches at position (x,y), we take the value for
f(x,y) to be the arg min{D(f(x,y)),D(f(x-1,y)),D(f(x,y-1))} in odd iterations [2].
Moreover, in even iterations, we take the value of arg
min{D(f(x,y)),D(f(x+1,y)),D(f(x,y+1))}. The process of computing match cost (also
referred to as distance) is shown in Section 3.3.
The random search process is much simpler. After the current level is initialised (when
the current level is the coarsest level) or the correspondence passed from the last level
is accepted (when the current level is not the coarsest level), an approximate
correspondence for current seeds is calculated. Therefore, the random search process
identifies the optimum potential correspondences in a specific area. The area is formally
defined by equation 2:
𝑈𝑖 = 𝑣0 + 𝑤𝑎𝑖 ∗ 𝑅𝑖 (2)
where 𝑈𝑖 is the current seed waiting for random search, 𝑣0 is the initial correspondence,
𝑅𝑖 is a uniform random in [-1,1] x [-1,1], w is the maximum search radius, and α is a
scale ratio that is usually fixed to 1/2. The entire equation describes a gradually
decreasing search area for i=0, 1, 2... until the search radius 𝑤𝑎𝑖 is less than 1. For a
certain number of random searches, we always retain the correspondence that has the
lowest match cost (shown in Section 3.3).
The Australian National University | 12
3.2 Coarse-to-fine scheme
The basic matching section has introduced the propagation and random search strategy.
It is similar to PatchMatch in which it is so noisy that many outliers may be generated.
One way to solve this is to increase the patch size, which can be achieved by using the
seeds. However, this will result in loss of accuracy. Thus, the coarse-to-fine scheme is
introduced to provide global regularization and maintain a satisfactory level of accuracy.
It is a hierarchical structure and is usually referred to as an image pyramid, as shown in
Figure (a) [7]. In the pyramids, there are several levels from top to bottom. These levels
indicate the degree of precision: the higher, the coarser. Given a factor n, we could
construct a k levels pyramid. The scale factor is generally set to 1/2, which means the
resolution of the image is reduced to half every level up. For two images 𝐼1 and 𝐼2, the
lth level of pyramid of I𝑖 is denoted as I𝑖𝑙 , i ∈ {1,2}, l ∈ {0.1, … , k − 1} [7]. The top
level of the pyramid is the coarsest level and the bottom level is the finest level, which
usually contains the raw image. The aim is to find the correspondence of every seed in
I10 and I2
0, starting from I1𝑘−1 and I2
𝑘−1.
The seed should be set up on each level at the same position for both images. The
position of the representing pixel is recorded in each seed. For example, for an image
with 1024x512 pixels, to get a seed set that has a spacing of 4 pixels on each level, we
should construct a 256x128 seed set. The seeds are set up at the cross points, such (1,
1), (1, 5), (5, 1), (5, 5) …etc. Since the CPM does not adopt any sub-pixel accuracy and
the number of seeds is always same, hence, on some coarser levels, some seeds might
represent the same pixel. All positions inside the seeds are truncated to the nearest
integer. This means that the propagation with random search is extensively executed on
coarser levels, which could ensure the robustness of correspondences on higher levels
[7].
The Australian National University | 13
The propagation and random search are then performed on each level. The first step is
to initialise the flow value on the coarsest level because the process will start from the
coarsest level and pass the value to the next level for initialising. The initial flow value
on the top level could be a random value. Thus, performing a propagation and random
search on this level will generate a better flow value that is used to initialise the next
level according to equation (3) [7]:
{f(𝑠𝑙)} =1
n· {f(𝑠𝑙+1)}, l < k − 1 (3)
Where n is the factor usually set as 1/2, and 𝑠𝑙 is the seed set on level l, and {f(𝑠𝑙+1)}
is the obtained flow passed from the last level. The propagation between adjacent levels
keeps performing until the finest level generates a final result.
ALGORITHM [7]:
The Australian National University | 14
3.3 SIFT descriptor and match cost
In the dense SIFT flow [4], an effective descriptor is proposed to extract features for
every pixel in an image. The SIFT descriptor combines the features of surrounding
pixels to derive a 128-dimension descriptor for a single pixel. The process of generating
SIFT descriptors is very fast so that adopting this approach does not significantly
increase the running time but provides a powerful and feasible module that can be used
in the matching process to compute the distance between seeds. In this section, the code
for generating SIFT descriptors is cited from Ce Liu’s website, which is also mentioned
in [4]. Furthermore, the matching process is performed to compute the summation of
absolute difference between two 128-dimension descriptors as the match cost (distance).
The equation is shown in Equation 4, where 𝑑1 is the descriptor of the first image and
𝑑2 is the descriptor of the second image:
C = ∑|𝑑𝑖1 − 𝑑𝑖
2|
128
𝑖=1
(4)
According to [4], a single descriptor is generated based on the features of a patch. Thus,
I assume that the patch size of a SIFT descriptor and the seed spacing of CPM might
have a correlation so that applying the same patch size will achieve a better performance.
The experimental results confirm my assumption and further indicate that when the size
of the SIFT patch is less than the seed spacing of CPM, there is more outliers; on the
other hand, when the size of the SIFT patch is larger than the size of the CPM patch, as
the SIFT patch grows, the result becomes fuzzy. A SIFT patch size that is the same size
or slightly smaller will help improve the correspondences. The results are shown in
Section 4.1.2. The experiments also demonstrate that the distance between two seeds
rarely appears to be the same when using descriptors. Therefore, the correspondences
generated by this matching process are convincing.
3.4 Outlier handling and interpolation method
As in [1] and [7], I also adopt a forward-backward consistency check. This is useful in
detecting the occlusions and removing the outliers. In order to ensure the error will not
propagate and accumulate level by level, the forward-backward consistency check is
performed at every level. The forward-backward consistency check involves
performing the programme twice, once in a forward direction to find the
correspondences from I1 to I2 and once in a backward direction to find the
correspondences from I2 to I1. If the difference at the same position is larger than a
The Australian National University | 15
setting value, the correspondence at that position will be treated as an outlier and
removed. Furthermore, to improve the accuracy, a threshold is set to a specific value to
limit the upper bound of the match cost. If the match cost is larger than the threshold,
even if the match cost is the lowest value of all iterations, the correspondence will not
be recorded. The experiments show that these two outlier handling methods could
considerably improve the accuracy but double the running time on each level. For
example, if there are 5 levels, the running time will increase fivefold. This is the
inevitable trade-off between efficiency and accuracy. However, the noticeable
improvement in accuracy is worth the extra time.
After the outlier handling process has been performed, the primary flow vector can be
derived. However, the flow is not dense until interpolation in the next step. With regard
to the interpolation method, in [10], the EpicFlow provides an edge-preserving
interpolation method. By using the SED (structured edge detector) [5], the edge data of
the source frame is derived. Then, given the edge data, the EpicFlow performs
interpolation within each block divided by the edge, which can preserve the edge as
much as possible. Finally, the primary semi-dense flow becomes a dense flow. At this
point, the whole process of generating CPM flow including the interpolation step is
complete.
3.5 Improving efficiency
As MATLAB is good at image and data processing but has low computational
efficiency, the running time of my MATLAB code is quite long compared with the
original CPM. Thus, following the instructions in the build-in “Profile” function, the
core function that is used to compute the distance between two seeds is adapted to C
language and integrated into a MEX file. A MEX file can provide an interface between
MATLAB and the C function. Generally speaking, C is much faster than MATLAB.
The running time is shortened by more than one-third without any loss of accuracy. The
optimisation of the code structure is also useful in increasing efficiency, such as by
removing unnecessary functions and changing some complex data structures to vectors
that are easy for MATLAB to process. Using this step, the computation time is
decreased by a further one-fifth. Nevertheless, the running time is still five times higher
than that of the original CPM.
The Australian National University | 16
Chapter 4. Experiments
In this section, I will first provide some evidence to substantiate previous statements.
Secondly, I will change some parameters in the MATLAB code to explore their effects
and try to find a relatively good combination. Finally, I will use an online benchmark
dataset to evaluate the performance of my project, which is the most convincing way to
assess its validity and application. Unfortunately, as the implementation does not have
a fast enough performance, so the real time optical flow estimation is not realised.
4.1 Evidence to substantiate previous statements
4.1.1 Performance variation between pixel-based and patch-based
matching
In Section 3.1, the computation time of pixel-based optical flow estimation is said to be
much higher than that of patch-based optical flow estimation. The results are shown
here:
As two For Loops are used in my code for
traversal of an image, the time complexity
for this part is O(n^2). Thus, the increase in
the number of seed points will significantly
increase the computation time. The size
(space) of patch indicates the number of
seeds. For the pixel-based method, the space
of the seeds can be considered as 1. For the
patch-based method, the seed spacing is 5,
which means, theoretically, the pixel-based
optical flow will be 25 times (5^2) longer
than the patch-based. The result also shows a
multiple of around 25.
Although there is a 20-fold difference between the computation times, the result does
not show any significant variation. We can see from Figure 1.1- Figure 1.4, in the left
column, that the 1024*436 resolution results are not quite outstanding as the right
column results in the same row are 205*88 resolution. However, the number of level
does indeed affect the accuracy in a visible way. In the bottom two images, there are
fewer outliers. While the right bottom image is still noisy, it is acceptable and can be
interpolated into a smooth flow.
Methods Time (s)
One level
Pixel-based optical flow 188.28
Patch-based optical flow 11.24
Three levels
Pixel-based optical flow 579.42
Patch-based optical flow 27.59
Table 1: computation time between pixel-based optical flow estimation and patch-based optical flow
The Australian National University | 17
The Australian National University | 18
4.1.2 The influence of cell size on SIFT descriptors
Based on this comparison in Figure 2, we can see that if the size of the SIFT patch is
too small, such as 1, the flow will have many outliers, some of which will be removed,
such as the white colour in the third image. The white colour can be interpolated by
EpicFlow. So, a small patch of white colour does not matter. The flow with a SIFT
patch and seed patch of equal size will generate a better and more satisfactory result. If
the size of the SIFT patch is much larger than the seed patch, the edge of the flow will
be too fuzzy to retain its shape. Therefore, the size of the SIFT patch should be equal
to or slightly smaller than the seed spacing in practice.
4.1.3 The effect of the outlier handling method
In Figure 3, the bottom two images demonstrate the effect of the forward-backward
consistency check. These flows are derived from large displacement images where
some elements in the first image will disappear in the next image (not shown here),
which is referred to as occlusion. Without a consistency check, occlusions will result in
an incorrect flow. For example, in the top right corner of the blade in the last image, the
red colour is the ghosting of the blade in the next image, which is not expected to appear.
Upon applying a consistency check, the ghosting is eliminated in the bottom left image.
Furthermore, the resolution on the edge is also improved, such as the shape of the girl
and the blade.
The Australian National University | 19
4.2 Effects of parameters
There are eight parameters that can be edited. They respectively are the number of the
pyramid level, the number of iterations, the space of a seed, the cell size of a SIFT patch,
the maximum searching radius, the number of random searches for each seed, the
threshold for matching cost and the range of forward-backward consistency checks.
The default parameters are [5, 4, 5, 5, 8, 16, 2000, 100], which are already tested on
other images and have a balance between time consumption and accuracy. They can be
divided into 3 types according to the factor that they affect: time and accuracy, time
only and accuracy only. The accuracy is represented by average end-point error (AEE)
that is the average difference for each pixel. The experiments will be conducted by
using the control variable method that changes a single parameter and observes the
effect. The test objects are a set of large displacement images for which my code does
not achieve an adequate performance. In other words, these sets of images are sensitive
to the modification of parameters. For each parameter, the experiment will be
conducted 3 times and the average will be taken as the final result.
Note that all experiments are run on Intel Core i7 4.0GHz CPU.
For the first type that affects both the time and accuracy, it includes the number of levels,
the space of patches and the number of random searches. The result is shown in Figure
4.1- Figure 4.4.
In this set of figures, the left Y-axis is the time consumption and it corresponds to the
blue line (the line with block on it). The orange line with the crosses is the average end
point error, corresponding to the right Y-axis. Figure 4.1 demonstrates that the number
of levels linearly affects the time consumption but the average end point error does not
linearly decrease. Once the number of levels exceeds 3, there is little improvement,
primarily because the number of levels will affect the resolution in low levels. As the
number becomes larger, the resolution exponentially decreases and becomes too coarse
to distinguish, which barely contributes to the matching process. Thus, the 3 levels
might be a wise choice to balance both time and accuracy.
In Figure 4.2, note that the time consumption exponentially decreases as the seed
spacing increases. The reason for this was explained in section 4.1.1.
In Figure 4.3, note that the number of random searches will not significantly affect the
AEE but the time increases as the number of random searches increases. The reason for
this is obvious as the number of random searches directly affects the number of
computations. Thus, for general images, the 8 or 16 is enough for accurate matching.
The Australian National University | 20
The number of iterations is also a factor that can affect the performance. One iteration
means the one-way propagation by scan order or reverse scan order. Theoretically, the
accuracy is positively correlated to the number of iterations. However, the time
consumption linearly increases at the same time. Thus, a number around the cross point,
such as 4 or 6, would be an ideal choice.
The Australian National University | 21
The remaining parameters are shown in Figure 5.1 – Figure 5.4. There are some
parameters that primarily affect the AEE and barely affect the time, such as the scale of
consistency check (Figure 5.2), the threshold (Figure 5.3) and the cell size of the SIFT
descriptor (Figure 5.4). This is because those parameters do not increase the time
complexity but affect the result by restricting the threshold or consistency checking
scale. As for the cell size of the SIFT descriptor, it only changes the precision of the
element used in computing the match cost and its dimension does not change. Thus, the
time consumption is not affected.
The accuracy variation on a maximum search radius (Figure 5.1) shows a different trend.
More specifically, the AEE decreases but increases at a specific point. This might be
attributable to the fact that a large search field will introduce more noise than a small
radius. However, an appropriate search radius will help to avoid falling into a local
optimum. Thus, the maximum search radius chosen should depend on the extent of
motion. For instance, a large displacement motion should have a relatively large search
radius, and vice versa. If adopting a large search radius introduces more outliers than
expected, we need to consider modifying the outlier handling method, though this
situation does not happen in experiments. The time also changes because the radius is
halved in each iteration. In effect, if log2(𝑟𝑎𝑑𝑖𝑢𝑠) is larger than or equal to the number
of iterations, the time will increase.
Following the analyses of all these parameters, a relatively good combination is derived,
which consists of 3 levels, 5 (or 6) seed spacing, 16 random searches in each iteration,
4 (or 6) iterations, 8 (or 16 in large motion) search radius, a consistency check scale of
100, 2000 as the threshold and SIFT descriptors that are 3 cells in size. These
parameters will be used as an online benchmark in the next section.
The Australian National University | 22
4.3 Benchmark testing on MPI-Sintel database
By using the parameter combination derived from the previous section, my work is
tested on the MPI-Sintel database. This is a challenging benchmark system based on an
open source animated film. The test set has 1128 images divided into a clean version
that removes motion blur and visual effects, and a final version that is extracted from
the real movie with complete motion blur and visual effects. In each version, there are
twelve image sequences in different scenes. The MPI-Sintel dataset provides the visual
results and ground truth so as to compare the results of different methods. In this section,
my work is temporarily referred to as custom-cpm.
In the MPI-Sintel dataset, custom-cpm ranks 62 on both final and clean sets, as
measured by overall average end point error. The CPM-flow ranks 8 and EpicFlow
ranks 15. The final checks of ranks are performed by 22 Oct 2016. Table 2 shows the
performances of custom-cpm, CPM-flow and EpicFlow.
Methods Time AEE All AEE Matched AEE unmatched
Clean Set
Custom-CPM 22s 8.444 3.471 48.890
CPM-flow 4.3s 3.557 1.189 22.889
EpicFlow 16.4s 4.115 1.360 26.595
Final Set
Custom-CPM 22s 9.467 4.514 49.781
CPM-flow 4.3s 5.960 2.990 30.177
EpicFlow 16.4s 6.285 3.060 32.564
Table 2: Test result on MPI-Sintel dataset. AEE matched is the AEE on non-occluded areas. AEE unmatched is the AEE on occluded areas. Time consumption obtains from [7] and [10].
These statistics convey two things. The first thing is that the time consumption of
custom-cpm is much higher than that of CPM-flow. Thus, it is impossible to realise the
real time optical flow estimation using this method. The reason for this might be
MATLAB’s limited computing capacity. In the middle of this semester, I adapted a
crucial function that computes the match cost from MATLAB code to C and is
compiled by MEX in MATLAB. The time consumption decreases by almost 50% on
only one function. The reason why the rewriting of the code has such a significant effect
is that the C code accessed the underlying hardware, thus it takes advantage of this
The Australian National University | 23
feature and improves the performance. Furthermore, the match cost computing function
can run more than 10 million times in total propagation and random search steps, a
minor improvement that will be magnified to contribute to obvious changes to
performance.
The second thing is that the total AEE of custom-cpm is more than twice that of CPM-
flow in the clean set and less than twice that of CPM-flow in the final set. Meanwhile,
compared with EpicFlow, custom-cpm is still twice as much as EpicFlow in the clean
set and 50 percent higher than EpicFlow in the final set. There is also evidence to
suggest gaps between custom-cpm and other works from the visualisation result on the
benchmark test.
In Figure 6, we can see from the first column that custom-cpm performs better on the
stick image. The CPM-flow does not find an appropriate match for the stick or treats
the stick as an outlier and proceeds to remove it. On the contrary, EpicFlow overfits on
the stick so that many outliers are maintained. Compared to CPM-flow, custom-flow
achieves poor accuracy at the edge of the image, such as at the bottom left corner and
the top right corner. It also has a pale area at the left side of the stick. As for the second
column, custom-cpm shows obvious defects on the left edge of the image and on the
right side of the girl’s body. These parts display an obviously different texture and an
unclear boundary to their neighbourhoods. However, custom-cpm still performs
commendably at the point between the two feet. In addition, it has a clear boundary
rather than an obscured boundary as in the CPM-flow and EpicFlow display. For the
third column, the custom-cpm shows remarkable strength in preserving details. For
The Australian National University | 24
instance, we can see that there are a few birds in the top centre of the image. The
custom-cpm was able to preserve more than half of these birds while CPM-flow and
EpicFlow preserve none. Nevertheless, custom-cpm still fails to match the largest bird
on the top right.
In order to determine why custom-cpm performs poorly in the aforementioned areas, I
check the original image and compare the semi-dense flows of custom-cpm and CPM-
flow which are not interpolated by EpicFlow. The comparison is shown in Figure 7.
In Figure 7, due to the different representation methods, the white in the first two rows
means the same as the black of the third row in that they both indicate the removal of
outliers. The first row has a small consistency check factor, which removes a large
percentage of outliers but subsequently fails to identify many right correspondences,
for example, the left side of the body in the top left image and the right boundary of the
top right image. However, if we increase the factor to a larger number, many outliers
as well as right correspondences are preserved. Thus, after interpolation, custom-cpm
performs poorly on the left boundary of the middle left image because the outliers
impact the accuracy of the interpolation process. This dilemma may be attributable to
the lack of a reliable occlusion handling method. In this case, as the occlusion handling
process of custom-cpm simply compares the flow values between forward results and
backward results, it may not be enough.
The Australian National University | 25
Furthermore, there is another issue with the interpolation process. As mentioned
previously, the white pixels in the first two columns and the black pixels in the third
column mean the same in that they both indicate the removal of outliers and empty flow
values. Despite using the same method (EpicFlow) to interpolate, generated outputs are
different in the empty space (shown in Figure 8). The custom-cpm has a large empty
area in the top-middle position, which is the same as the CPM-flow. However, the
interpolation in CPM-flow generates a much better result than the interpolation in
custom-cpm. On the left edge of CPM-flow, there is an empty area caused by occlusions.
On the contrary, this area is perfectly interpolated by EpicFlow. Meanwhile, on the
right edge of custom-cpm, the situation is almost the same in that there is an empty area
that needs to be interpolated. However, EpicFlow only blurs the border of the white and
cyan, which is difficult to explain as EpicFlow does not provide any API to change the
parameters in interpolation. Thus, I have no choice but to compromise by increasing
the factor of the consistency check to maintain as many right correspondences as
possible. Once I identified this problem in occlusions and large displacement motion
scenes, there was not much time left for me to find another interpolation method or
incorporate more reliable occlusion handling methods.
Overall, custom-cpm has a slow computation time but has a better ability to preserve
details in video frames. However, due to the lack of a reliable occlusion handling
method, the accuracy is relatively poor for the occlusions and large displacement
motion scenes. This disadvantage could have perhaps been overcome if there was more
time to try other methods and improve the custom-cpm itself.
The Australian National University | 26
Chapter 5. Future work
As my project did not achieve a satisfactory performance in the optical flow estimation
of large displacement motion, there is significant work that can be done in the future to
extend and expand upon the project.
Compared to CPM-flow, my project still has three obvious defects on which the future
works can focus. The first defect relates to time consumption. So far, the time
consumption is five times that of CPM-flow. There are also at least two areas that can
be improved, namely data structure and programming language. As the matching
process is performed around 10 million times in total, a highly efficient data structure
would no doubt make the searching and matching process more effective. Furthermore,
as mentioned in the previous section, adapting the MATLAB code to C/C++ could
significantly decrease time consumption. The second defect is the outlier handling
method which causes difficulty in distinguishing the right correspondences and the
outliers. I think a more reliable occlusion handling method could help overcome this
issue because most wrong correspondences tend to appear in the occlusion area. The
third defect involves poor interpolation in the empty area, which may be attributable to
the empty area being unclear. Alternatively, the EpicFlow interpolation method may
need to be modified. In addition, we could also find other descriptors that include more
information and use them in the matching process to improve the accuracy of the
matches generated. At least for these three defects, the related works can be carried out.
The Australian National University | 27
Chapter 6. Conclusion
In conclusion, my project has been inspired by [7] and implements a coarse-to-fine
PatchMatch optical flow estimation algorithm in MATLAB. I conducted a series of
experiments to optimise the implementation as well as analyse the effect of parameters.
This project combines coarse-to-fine hierarchical architecture and PatchMatch with
propagation and random search to decrease the noise and introduce global
regularization. It also uses SIFT descriptors to characterise the seed pixel in order to
obtain an accurate correspondence. Finally, EpicFlow interpolates the semi-dense
optical flow to dense optical flow. Unfortunately, due to the limited efficiency of this
method, the real time optical flow estimation cannot be realised.
Compared to the original CPM-flow, my implementation can estimate large
displacement motion to some extent, but performs poorly when the large displacement
is accompanied by occlusions. Nevertheless, it has its own advantage in that it can
preserve many details and tiny structures, which is a significant benefit considering
both CPM-flow and EpicFlow perform poorly in terms of detail preservation. Moreover,
although the time consumption is higher than that of CPM-flow, it is still faster than
many other dense optical flow estimation methods.
In the future, studies should aim to improve the computational efficiency and
incorporate an occlusion handling method so as to improve the accuracy on the
boundaries and occlusion areas.
The Australian National University | 28
References
[1]L. Bao, Q. Yang and H. Jin, "Fast Edge-Preserving PatchMatch for Large
Displacement Optical Flow", IEEE Transactions on Image Processing, vol. 23, no.
12, pp. 4996-5006, 2014.
[2]C. Barnes, E. Shechtman, A. Finkelstein and D. Goldman, "PatchMatch: A
randomized correspondence algorithm for structural image editing", TOG, vol. 28,
no. 3, p. 1, 2009.
[3]T. Brox and J. Malik, "Large Displacement Optical Flow: Descriptor Matching in
Variational Motion Estimation", IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 33, no. 3, pp. 500-513, 2011.
[4] Ce Liu, J. Yuen and A. Torralba, "SIFT Flow: Dense Correspondence across
Scenes and Its Applications", IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 33, no. 5, pp. 978-994, 2011.
[5]P. Dollar and C. Zitnick, "Fast Edge Detection Using Structured Forests", IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 8, pp.
1558-1570, 2015.
[6]B. Horn and B. Schunck, "Determining optical flow", Artificial Intelligence, vol.
17, no. 1-3, pp. 185-203, 1981.
[7]Y. Hu, R. Song and Y. Li, "Efficient Coarse-to-Fine PatchMatch for Large
Displacement Optical Flow", CVPR 2016, pp. 5704-5712, 2016.
[8] Li Xu, Jiaya Jia and Y. Matsushita, "Motion Detail Preserving Optical Flow
Estimation", IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 34, no. 9, pp. 1744-1757, 2012.
[9]Y. Li, D. Min, M. Brown, M. Do and J. Lu, "SPM-BP: Sped-up PatchMatch
Belief Propagation for Continuous MRFs", The IEEE International Conference on
Computer Vision (ICCV), pp. 4006-4014, 2015.
The Australian National University | 29
[10]J. Revaud, P. Weinzaepfel, Z. Harchaoui and C. Schmid, "EpicFlow: Edge-
preserving interpolation of correspondences for optical flow", Computer Vision
and Pattern Recognition (CVPR), 2015.
The Australian National University | 30
Appendix 1
The Australian National University | 31
Appendix 2
The Australian National University | 32
Appendix 3
Here is the file list of my software. My own works include
“collection_of_data_processing.m”, “demo.m”, “init_match.m”, “main.m”,
“pyramid.m”, “random_search.m”, “custom_match.mexw64” and “seet_set.m”.
As for “computeColor.m”, “flowToColor.m”, “readflo.m” and “readFlowFile.m”, they
refer to Desun Qing’s work.
As for “desc_generate.m”, “mexDenseSIFT.mexw64” and
“mexDiscreteFlow.mexw64”, they refer to the online code of SIFT Flow. “C. Liu, J.
Yuen and A. Torralba. SIFT Flow: Dense Correspondence across Scenes and its
Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI), 2010.”
Other files are prepared for demo and testing.
In the bottom of “demo.m” file, there is a piece of code prepared for a general testing
of the correctness. When testing the correctness, a ground truth “*.flo” file is required.
The Australian National University | 33
I have provided 3 “.flo” file for testing that are “frame_0005.flo”, “frame_0017.flo”
and “frame_0040.flo”. They are corresponding to three groups of images. For example,
when we are estimating a group of pictures named “frame_0005.png” and
“frame_0006.png”, we can remove the comment symbol on the bottom of “demo.m”
code and change the path of ground truth to “frame_0005.flo”. Then, executing the
demo file, an average end point error will display in MATLAB.
The more accurate test requires to use EpicFlow to generate dense optical flow and use
the “compute AEE” code in “collection of data processing.m”.
As I said above, to conduct an experiment, we can modify the path of “image1” and
“image2” in “demo.m” file. The struct “model” is also modifiable. If we want to obtain
a rough end point error, we can use the bottom code in “demo.m” file. If we need a
more accurate comparison, we need to use the “compute AEE” module in “collection
of data processing.m”.
All of my code is written and tested on Windows 10 64 bit, MATLAB r2015b. The mex
module in MATLAB is required.
The Australian National University | 34
Appendix 4
README:
This is the software package of my individual project of COMP4560 in Australian
National University:
Qiang Duan, Efficient Dense Optical Flow Estimation from Two-Images, 27th Oct
2016
Email: [email protected]
Usages:
All the MATLAB files are tested under Windows 10 64bit, based on Intel I7-6700k
CPU.
For default parameters, it costs 22s to computes optical flow for a 436*1024 resolution
image.
This software requires MEX module to compile and invoke some functions.
Run 'demo.m' first. The parameters in struct "model" could be modified.
After the result is derived, the successive work is to use
'collection_of_data_processing.m' to generate .txt file of flow value and edge file by
running SED_edge toolbox (does not provide here).
Combining the .txt file and edge file, the EpicFlow can generate the final dense optical
flow.
'main.m' is a function involves the main part of this software.
'random search.m' is a function includes the process of propagation and random search.
'desc_generate.m' generates the descriptors.
'init_match.m' is a function generates the initial original seed position.
'pyramid.m' is a function generates the image pyramid.
'seed_set.m' is a function generates the seed set which contains the pixel's position.
'custom_match.mexw64' is a mex function computes the match cost between two
descriptor.
Reference:
@inproceedings{Butler:ECCV:2012,
title = {A naturalistic open source movie for optical flow evaluation},
The Australian National University | 35
author = {Butler, D. J. and Wulff, J. and Stanley, G. B. and Black, M. J.},
booktitle = {European Conf. on Computer Vision (ECCV)},
editor = {{A. Fitzgibbon et al. (Eds.)}},
publisher = {Springer-Verlag},
series = {Part IV, LNCS 7577},
month = {oct},
pages = {611--625},
year = {2012}
}
@inproceedings{revaud:hal-01142656,
TITLE = {{EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical
Flow}},
AUTHOR = {Revaud, Jerome and Weinzaepfel, Philippe and Harchaoui, Zaid and
Schmid, Cordelia},
BOOKTITLE = {{Computer Vision and Pattern Recognition}},
YEAR = {2015},
}
C. Liu, J. Yuen and A. Torralba. SIFT Flow: Dense Correspondence across Scenes and
its Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI), 2010.