object tracking methods and their areas of application: a ... · object tracking methods and their...

Object tracking methods and their areasof application: A meta-analysisA thorough review and summary of commonly used object tracking methods

Sanna Agren

Sanna Agren

Master Thesis in Computing Science, 30 creditsSupervisor: Patrik EklundExaminer: Henrik Bjorklund

1

Abstract

Object tracking is a well-studied problem within the the area of imageprocessing. The ability to track objects has improved drastically duringthe last decades, however, it is still considered a complex problem tosolve. The importance of object tracking is reflected by the broad areaof applications such as video surveillance, human-computer interaction,and robot navigation.

The purpose of this study was to examine, evaluate, and make a sum-mary of the most common object tracking methods. In this paper a thor-ough review of the object tracking process is presented. This includesselection of object representation, object features, methods for objectdetection, and methods for tracking the object over succeeding frames.A summary of the object tracking methods covered in this paper is pre-sented in the result section, including advantages, disadvantages, andfor which context each method is suitable for.

Metoder for objektsparning och dess tillampningar : Enmeta-analys

Sammanfattning

Objektsparning ar ett valstuderat problem inom bildprocessering. Formaganatt spara objekt har forbattrats drastiskt under de senaste decennierna,med det anses dock fortfarande vara ett komplext problem att losa. Bety-delsen av effektiv objektsparning speglas av det breda anvandningsomradet,sasom videoovervakning, manniska-datorinteraktion, och robotnaviga-tion.

Syftet med denna studie var att undersoka, utvardera, samt gora ensammanfattning over de vanligaste objektsparningsmetoderna. Dettainkluderar valet av objektrepresentation, objektegenskaper, metoder forobjektlokalisering samt metoder for att spara objekt over efterfoljandebildrutor. En sammanfattning over de objektsparningsmetoder som harinkluderats i denna studie ar presenterad i resultatet, innehallandes fordelar,nackdelar, samt i vilken kontext metoderna ar lampliga att anvanda.

i

Acknowledgements

First of all I would like to thank Johanna Bjorklund and Rickard Lonneborg at Codemillfor allowing me the opportunity to write my thesis at their company. I would also like tothank my company supervisor Martin Wuotila Isaksson who has given me great advice inthe process of completing my thesis. My university supervisor, Patrik Eklund, has also beenof great help and always taken the time to answer any questions I had. Last but not least Iwould like to thank my partner Emil Nylind for always supporting me.

iii

Contents

1 Introduction 1

1.1 Purpose 1

1.2 Problem formulation 2

1.3 Delimitation 2

2 Background 3

2.1 Related work 4

3 Method 5

3.1 Data collection 5

3.2 Data analysis 6

3.3 Source criticism 6

4 The tracking process 7

4.1 Object representation 8

4.1.1 Shape representation 8

4.1.2 Appearance representation 10

4.2 Feature selection 10

4.2.1 Edges 11

4.2.2 Optical flow 11

4.2.3 Color 11

4.2.4 Texture 12

4.3 Object detection 12

4.3.1 Point detectors 12

4.3.2 Background subtraction 12

4.3.3 Segmentation 13

4.3.4 Supervised learning 14

4.3.5 Temporal differencing 14

4.4 Object tracking 14

v

4.4.1 Point tracking 16

4.4.2 Kernel tracking 18

4.4.3 Silhouette tracking 20

5 Result 23

6 Discussion 27

7 Conclusions 29

7.1 Future work 29

7.2 Concluding remarks 30

References 30

A Summary table of result 35

B List of concepts 37

vi

1(37)

Chapter 1

Introduction

It is by looking and seeing we come to know the world we live in. The environment thatsurrounds us is filled with endless types of objects and impressions. Vision is in other wordsa means to gain an understanding for the world around us. Exactly how the visual systemworks remains a mystery to be solved, even though physiologists have been investigating thephenomena for decades. When speaking about vision, by replacing the living creature witha computational instrument, we have the broad and abstract expression computer vision.It can be summarized as the process of computers analysing digital images or videos andgaining a high level understanding from it [42].

Object tracking is an area within computer vision which has many practical applicationssuch as video surveillance, human-computer interaction, and robot navigation [19]. It isa well-studied problem, and in many cases a complex problem to solve. The problem ofobject tracking in video can be summarized as the task of finding the position of an objectin every frame [27]. The ability to track an object in a video depend on multiple factors,like knowledge about the target object, type of parameters being tracked and type of videoshowing the object [14].

Object tracking is an important part of a human-computer collaboration in a continuousenvironment, in the sense of allowing the computer to obtain a better model of the realworld. For instance in the application area of autonomous vehicles where it is not possiblefor a human to communicate the state of the environment accurately and quickly enoughgiven the requirements of the agent.

The broad area of application reflects the importance of reliable, exact, and effective objecttracking. There are several important steps towards effective object tracking, including thechoice of model to represent the object, and object tracking method suitable for the task.

1.1 Purpose

The purpose of this thesis was to conduct a review of common techniques for object track-ing on a high level abstraction, including advantages, disadvantages, and suitable areas forusage. By making a thorough and deep analysis of techniques and presenting these in acomprehensible overview, the hope is that the result of the thesis can be used to quicklyand easily determine suitable methods of object tracking, depending on the application and

2(37)

the purpose of the tracking. The thesis aimed to present information usable for the com-pany presenting the problem as an informed recommendation of different object trackingtechniques.

1.2 Problem formulation

The problem was about the ability to recognize different objects in a video, where the ob-jects can be rigid (e.g. container) as well as non-rigid (e.g. clothing). Object tracking couldbe used to automate the process of presenting information about a certain object displayedin a video instead of manually having to search for the object in, for instance, a productcatalog.

The thesis aimed to, as far as possible, answer the questions below.

• What methods for object tracking/image recognition in video are there?

• What separates the methods, and what are the benefits and disadvantages with these?

• Are different methods more suitable for certain applications or environments?

1.3 Delimitation

The aim of this report was to make a comprehensive summary of different techniques whichcould be used to quickly and easily decide which methods that are suitable, or not suitable,for the chosen area of application. With this in mind it was concluded that the comparativework should focus on object tracking groups, or categories, and not on specific algorithms.Since there exists many different approaches, methods, and variants of methods for objecttracking it would be an impossible task to thoroughly cover all methods used in some objecttracking implementation. Especially since there are proprietary methods whose implemen-tations and designs are not available for study. This study focused on the, in literature, mostcommon categories or groups of object tracking for rigid as well as non rigid objects.

Aside from object tracking techniques this thesis does also include other important parts ofthe tracking process. Since the aim of this thesis is to review methods for object tracking,and there are time requirements to meet, the result section will only include a summary ofthe material presented in section 4.4, Object tracking.

3(37)

Chapter 2

Background

Some of the first occurrences of computer vision related work took place in the early 1970s.Researchers wanted to be able to mimic human intelligence in computers and computervision was considered a visual perception component in this agenda. The task of solvingthe vision input problem, i.e. feeding a computer with visual input and making it describewhat it sees, was believed by some pioneers in the area of artificial intelligence to be an easystep along the way of solving more difficult and interesting problems [37]. One example ofthis is The summer vision project where the undergraduate student Gerald J. Sussman wasasked by the Artificial Intelligence Group at MIT to perform a summer project. The projectaimed to make a computer describe what it saw by linking a camera to it [25]. Turns out,this task would have needed a couple of more decades of research to perform.

Digital image processing was already an existing field. However, the wish to get a fullscene understanding brought forth the desire to use images to recover the three dimensionalstructure of the world, which was considered a step towards the ultimate goal. Some ofthe first attempts for this kind of scene understanding was done by edge extraction andtranslation of the 2D lines to 3D structures. During this time some work was also performedin interpreting factors such as color intensities and shade variations to be able to explainthem in terms of image phenomenas like surface orientation and shadows [37].

In the 1980s focus was shifted to performing quantitative image analysis. A lot of effort wasput in developing sophisticated mathematical techniques for this purpose [37]. Researchwas also performed to improve the ability to detect objects in images, with methods likeedge and contour detection [5], as well as introducing the concept of evolving contourtrackers such as snakes [16]. Continued work made researchers come to the discovery thata lot of the proposed algorithms could be treated within the same optimization framework.In other words, they discovered that several algorithms could be described using the samemathematical framework, if it was posed as different optimization problems [37].

During the following decade researchers continued to explore relevant topics, where somebecame more interesting than others. A lot of important work, not least for the ability totrack objects, was performed during this period when various tracking algorithms were im-proved drastically. These included contour tracking using active contours such as snakes,and particle filters. Also worth noticing is that interaction with computer graphics in-creased [37]. To explore the idea of creating animations using images of the real worldthe technique of image morphing was used [3], and later other techniques such as view

4(37)

interpolation. A lot of progress and improvement was also done in important fields suchas optical flow methods, global optimization using graph cut techniques, and image seg-mentation. During the 2000s the work with developing techniques in the area of computervision has continued. Among other areas we have seen increased interest in the area of com-plex global optimization problems, where focus has been on development of more efficientalgorithms [37].

Tracking objects in video has made great progress in some categories, for instance humans,faces, and animals. Despite this it remains a challenge to track generic objects since theirvisual appearance can change from one moment to another, due to, for example, movementor light changes [14]. Even though we still have a long way to go and many areas toexplore further, there has been great progress of object tracking techniques during the years.Object tracking is today used in applications such as video surveillance, human-computerinteraction, robot navigation, activity recognition, anomaly detection, virtual reality, objectnavigation, and path detection [14, 19].

2.1 Related work

There are three basic steps in video analysis, these are object detection, object tracking,and recognition of object activities by analysing their tracks [2]. Concerning the trackingprocess there are many different algorithms which can be divided into broad groups, orcategories. One common way is to part the methods into three main groups which arepoint tracking, kernel tracking, and silhouette tracking [6, 24, 28, 35, 48]. Other ways tocategorize methods in object tracking are region-based tracking, contour based tracking,and boundary-based tracking [8, 13, 22], even though these are not as commonly occurringin literature.

5(37)

Chapter 3

Method

To be able to find and systematically review state of the art object tracking techniques in astructured, objective and scientifically valid way, a literature study and a analytic review ofexisting techniques has been performed.

A literature study, or literature review, is a method to make a summary of the researchfindings that other have reported. A study of this kind does not report any new informationor research [1].

3.1 Data collection

During the initial work stage a problem formulation was defined in consultation with thecompany presenting the problem. The problem formulation was used to define search key-words for the literature collection stage. The search keywords provided a foundation forthe collection of scientific papers. Since the aim, among others, was to find out what objecttracking techniques that exist today, the search keywords were broad to not exclude anyimportant information. They aimed to capture the, in literature, most common methods forobject tracking. The search keywords were:

• Object tracking

• Video tracking

• Methods

• Categorization

• Video

The search keywords were combined in different compositions called keyphrases. The col-lection of literature was done using Google Scholar and the article database available fromthe Umea University library homepage. The prerequisite for the considered papers was thatthey needed to be available in English and have a full-text option free of charge.

6(37)

The literature collection was carried out parallel throughout the entire work of the study.Collecting literature has been a continuous process aiming to deepen and increase the ana-lytic precision.

3.2 Data analysis

A careful and systematic review of scientific papers has been necessary to achieve a reli-able result. Analysis of papers containing information important to the end result has beenperformed in several steps.

The keyphrases generated many hits on scientific papers that were considered relevant forthe work to be done. If a paper was relevant or not was firstly based on title and abstract.The next step in the process was to briefly read the full article to get a general understandingof the material, which was done several times if needed. In some cases further reviewingof papers lead to exclusion if the content was not as relevant as first perceived. The laststep in the collection phase was to categorize the papers based on content. They were alsoordered based on relevance and perceived importance in the research area of object tracking.Similarities between categories in different papers made it possible to compare the materialand make a summary relevant for the purpose of this study.

The process can be summarized as follows:

1. Choose search keywords.

2. Use search keywords in different combinations, called keyphrases.

3. Choose relevant papers for a keyphrase, based on title and abstract.This is called selection 1.

4. Exclude any papers that are not as relevant as first perceived.This is called selection 2.

5. Categorize papers based on content.

6. Sort papers based on relevance and research importance.

3.3 Source criticism

It is always important to have a critical attitude towards information presented to you. Inthis study the reliability of the source material was evaluated by the following criteria:

• Who published the information? Is it a person, organization or a company?

• Of what purpose is the information published?

• Does the publisher have financial stake in the result?

• When was the information published, and is it up-to-date?

• Is the paper supported by sources and references that are relevant?

7(37)

Chapter 4

The tracking process

To track an object in a video can be expressed as the process of finding the motion pathof an object over time, by locating its position in every frame of the video [48]. Objecttracking has emerged as one of the most popular research subjects within the area of com-puter vision. Although it is a well-studied problem, it remains a challenge in many aspects.Building trackers for specific object classes like humans or faces has made great progressover the years, while trackers for generic objects continues to be a difficult area. Objects thatdrastically changes appearance due to e.g distension or light changes, noise in images, com-plex motion and complex object shapes are some examples of why generic object trackingis considered to be a challenging problem [14, 48].

Imagine looking at an arbitrary moving object that suddenly changes its motion pattern, orappearance features like shape or color. This, together with multiple other objects in thescenery, that might behave in a similar way. Keeping track of the object would in mostcases be a difficult task, even for the human eye. With this in mind it suggests that this isalmost an impossible task for a computer to manage.

Having said this, there are ways to simplify tracking by applying some constraints. Almostall algorithms assume that the motion of the object being tracked is smooth, with no suddenchanges. Constant velocity or acceleration as well as prior knowledge about the objectappearance are other ways of simplifying the task [48].

The process of building a object tracker is usually divided into several steps, which are ob-ject representation, object detection, and object tracking [35]. This chapter will contain athorough review of different methods for object tracking, which is the focus area of this the-sis. However, it will also briefly cover the area of object representation and object detection,since these are important steps in the process of successful object tracking. This way thereader is provided with a better basis and understanding of the process of building an objecttracker.

8(37)

4.1 Object representation

To be able to track an object of interest it first needs to be represented in a way that makessense to a computer. Properties regarding appearance and shape are usually used as a basisfor the representation. Appearance representation or shape representation can be used aloneto represent an object, but can also be used in combination for this purpose. External fac-tors like the application domain, purpose, and goal are what determines how the object, orobjects, should be represented. The representation in turn is what determines the choice ofa suitable algorithm for the tracking [48].

To summarize, object representation is usually defined to consists of a shape representationand/or an appearance representation. This section starts with a brief description of differentmethods regarding shape representation, which is followed by a description of methods forappearance representation.

4.1.1 Shape representation

There are several methods for representing, or describing, the shape of an object so thatit is possible to makes calculations for detection and tracking, where some representationsare more suitable for certain types of objects. It is possible to find advantages as well asdisadvantages with every type of representation model, these often vary depending on theapplication domain and type of object.

Common ways to use shape representation includes points, geometric shapes, silhouettes,contour, articulated shape models, and skeletal models [48]. These shape representationstogether with a brief description regarding area of use are introduced below.

Points

The object of interest is either represented by a single point (figure 1(a)), or a set of points(figure 1(b)). Using a set of points to represent objects can be problematic when trackingmultiple objects in a video, in the case of interaction between objects, e.g. partial or fullocclusion. Difficulty to keep track of which point that belongs to which object can easilycause misdetections. Because of this it is more suitable for small, simple objects that can berepresented using a single point.

Geometric shapes

Primitive geometric shapes such as a rectangle or ellipse are used as shape representation(figure 1(c)), figure 1(d)). Using simple shapes as representation is a common approachfor both rigid and non rigid objects, although it is more suitable for simple, rigid objects.Objects in videos are usually not quite as simple and exact as these type of shapes. It istherefore common that parts of the objects are left outside, or parts of the background isincluded in the shape template, which could cause tracking problems.

9(37)

Silhouette and contour

Using the outline or boundary of an object to represent it is called contour representation(figure 1(g), figure 1(h)). Silhouette representation means using the region inside the con-tour (figure 1(i)). Using contour or silhouette as representation makes it easier to representcomplex and/or non rigid objects. It is a flexible model with the ability to represent manydifferent object shapes.

Articulated shape models

An object structured by different parts which are held together by joints are called an artic-ulated object. This can be used when representing, for instance, a human being with partslike head, torso, arms, legs, hands, and feets (figure 1(e)). Each part can be representedusing simple geometric shapes like ellipses, which is a model covered above.

Skeletal model

Using the silhouette of an object and applying medial axis transform makes it possible to ex-tract the object skeleton (figure 1(f)). This is a model commonly used in object recognition,but is not as commonly occurring in literature regarding object tracking.

Figure 1: Different approaches regarding shape representation [48]

10(37)

4.1.2 Appearance representation

Similar to the shape representation there are numerous methods of representing an object byappearance. Some common ways of representing an object are using probability densities,templates, active appearance models, and multiview appearance models [48]. These arelisted and explained below.

Probability densities of object appearance

A probability density function describes the probability of a random variable falling withina particular range of values [39]. By using the interior region of an image specified by theshape model, for instance by a contour, an estimation of the probability densities of objectappearance features can be computed. The appearance features can be for example color ortexture. The probability densities can either be parametric such as Gaussian distribution (ornormal distribution), or nonparametric such as histograms.

Templates

Silhouettes or simple geometric shapes are used to form templates. The templates cancarry both spatial and appearance information, which is an advantage of the method. Usingtemplates can be problematic for objects that look drastically different from different viewssince the templates only encode the appearance features from one view, this makes themodel more suitable for objects whose poses does not vary. Problems can also arise whenthe appearance features change noticeable during the tracking, where one example is howcolor features are sensitive to illumination changes.

Active appearance models

A set of landmarks which can either reside on the object boundary or inside the object regiondefines the object shape. The object appearance is simultaneously modeled by storing anappearance vector for each landmark. This can be in the form of color, texture, or gradientmagnitude. This model does however require a training phase where shape and associatedappearance is learned from a set of samples.

Multiview appearance models

Unlike templates these models encodes different views of the object. There are different ap-proaches for doing this, one example is generating a subspace from the given views. Exam-ples of subspace approaches that have been used for this purpose are Principal ComponentAnalysis (PCA) and Independent Component Analysis (ICA) [4].

4.2 Feature selection

An important part of object tracking is to select features of the object that makes it dis-tinguishable. This process is usually done manually. The choice of object representation

11(37)

strongly affects the feature selection since these are closely related. For example, objectedges are commonly used as features when using contour-based representation [27]. Somecommon feature selections are listed below.

4.2.1 Edges

The border between an object and the background are usually easy to distinguish. It isin other words usually not hard to locate the boundaries of objects in an image for thehuman eye. This is due to the fact that boundaries generate strong changes in the imageintensities. Using edges as the representing feature then allows tracking algorithms thattrack boundaries of an object to recognize these changes in intensity. Edges are less sensitiveto changes in illumination than, for instance, color [48].

4.2.2 Optical flow

Refers to the apparent motion of brightness patterns in a visual scene, is sometimes referredto as motion feature. Optical flow is a visual phenomena that almost all humans experienceeach day. When driving a car and looking out the window, objects outside such as trees andbuildings appear to be moving backwards. This apparent motion is optical flow. Calculatingthe apparent motion is done by identifying each pixels movement between frames. Thebrightness constraint is used in these calculations, which means that corresponding pixelsin different frames are consistent in brightness. An example of this can be seen in figure2 [36].

Figure 2: Brightness remains the same in a specific region, even though the location of thepixels may have changed [36]

4.2.3 Color

The data from different frames can be stored in different color spaces. Usually color isrepresented by RGB (red, green, blue) in image processing, but are sometimes representedby other color spaces such as YCbCr and HSV. One problem with using color as featurerepresentation is the sensitivity to changes in illumination, since the apparent color of anobject is directly affected by the illumination factor. In addition to illumination, the apparentcolor is also affected by the reflectance properties of the object [48].

12(37)

4.2.4 Texture

Properties like smoothness and regularity are relevant and can be quantified by measuringthe intensity variation of a surface. A processing step is required to generate the descriptors,which exist in several different forms [48]. One texture descriptor is Law s texture measureswhich involves filters corresponding to level, edge, spot, wave, and ripple [17].

4.3 Object detection

To be able to track an arbitrary object it first needs to be detected and identified in the videosequence. An object detection mechanism of some sort is always needed regardless of thetracking method used. However, there are two different approaches for this. The detectioncan either be done in every frame or when the object first appears in the video [48]. Thereare many different methods for detecting objects. This section covers some of the mostpopular methods in this area.

4.3.1 Point detectors

This method refers to the detection of interest points in images. An interest point shouldpreferably be stable under changes in scene illumination and camera viewpoint [20].

In literature it is common that the term corner and interest point are used more or less inter-changeably, which sometimes causes confusion. The intersection of two edges is one wayto define a corner, another way is as a point for which two edges in a local neighbourhood ofthe point have different directions. As mentioned above, an interest point is a well-definedpoint in an image which can easily be detected. This means that a point of interest can be aline ending, a point of maximum on a curve or a local intensity maximum or minimum, butit can also be a corner. Even though there exist methods for so called corner detection, thesedoes in practise detect interest points in general and are therefore included in the categoryof point detectors [47].

There exist many different methods for point and corner detection, where one of the mostpopular methods is the Harris detector, which exist in several variants. The detection isdone by computing the eigenvalues of a certain matrix which can be viewed as the scat-ter matrix of the image gradient computed over a small region of the image. Corners aredetected by comparing the eigenvalues with a threshold value [47].

4.3.2 Background subtraction

Background subtraction is used to separate a foreground object from the background. Thebasic approach when using this method is building a background model that represents thescene. The background model works as a reference and must therefore be continuouslyupdated and contain no moving objects. Each frame is then compared to the backgroundmodel so that changes in the image can be recognized. By comparing each video frameagainst the background model it is possible to recognize moving objects in terms of devia-tions from the reference model [29].

The algorithms used for background subtraction are considered simple and straightforward

13(37)

to use, the method is however very sensitive to changes in the environment. The back-ground subtraction technique can be divided into two groups, recursive techniques and non-recursive techniques. Recursive techniques base the background model on every videoframe by recursively updating the background model. The result of this is that the modelcan be affected by input frames processed in a distant past. Compared to non-recursivetechniques this method require less memory storage, but eventual errors in the backgroundmodel can linger for a longer period of time. Non-recursive techniques stores a buffer withthe last n video frames. The background model is then based only on the frames stored inthe buffer, all other is irrelevant. The relevance of the background model is updated faster,however, the memory requirements are growing as the size of the buffer gets larger [35].

4.3.3 Segmentation

This is the process of partitioning an image into multiple segments, or regions. The pro-duced segments will collectively cover the entire image and be perceptually similar withrespect to, for example, color or texture. This is helpful for locating objects and bound-aries [37]. Figure 3 shows one example of how segments in an image can look.

There are different approaches for this method. One example is active contours, also calledsnakes, with the purpose of finding an object outline in an image. A snake is pulled towardsimage features like lines and edges by the guidance of external constraints combined withinfluences by image forces, the effect can be seen in figure 4. Active contour alone does notsolve the task of finding contours in an image. The method depends on other mechanisms,for instance interaction with users. The reason for this is that knowledge about the desiredcontour shape is needed beforehand [16, 32].

Figure 3: A segmented image [48]

14(37)

Figure 4: The work of active contours, also called snakes [45]

4.3.4 Supervised learning

Instead of storing a complete set of templates to represent different views of objects it is pos-sible to obtain object detection by learning different object views automatically. Supervisedlearning methods uses some set of learning examples to generate a function that maps inputsto desired outputs. In other words, the computer is provided with learning examples whichcontains the expected answer defined by a human. Unclassified data can then be processedby the resultant function producing probable outputs, called classification. In the area ofobject tracking, the process of feature selection has an important role in the performance ofthe classification.

There exists several different learning approaches such as neural networks, adaptive boost-ing, decision trees, and support vector machines [48].

4.3.5 Temporal differencing

The method of temporal differencing identifies differences between two or three consecutiveframes, pixel-wise. This method does not work well when an object moves slowly in ascenery, since it fails to extract all relevant pixels. This problem can also occur with uniformtexture. The object is lost if it stops to move completely since no change can be detectedbetween frames [15].

4.4 Object tracking

There are many different algorithms which are used for the purpose of tracking objects.Some only handle single object tracking while other explicitly handle the case of occlusionto make it possible to track multiple objects. Algorithms used for object tracking can bedivided into broad groups, or categories. One common way is to part the methods intothree groups which are point tracking, kernel tracking, and silhouette tracking. These areusually divided into two subcategories each, see figure 5. The multi-view based method andthe contour evolution method are in turn often further divided [24, 27, 28, 48]. However,the division of the contour evolution approach and the multi-view based approach is notrelevant for the focus of this thesis and will therefore not be included.

15(37)

Other ways to categorize methods in object tracking are region-based tracking, contourbased tracking, and boundary based tracking [8, 13, 22], even though these are not ascommonly occurring in literature.

Both contour based tracking and boundary based tracking describes the process of trackingobject using the border between the tracked object and the background. In other words thecontour of the object [13, 22]. The method of tracking objects with the object contour isusually described in literature as a subgroup to the category silhouette tracking mentionedabove. Based on this, the contour tracking method will be covered as a subsection of thesilhouette tracking section.

Similarly, region based tracking describes the process of tracking objects based on the com-plete object region [13, 22]. In literature, silhouette tracking is occasionally called regiontracking [48], and is thereby in this report included in the section concerning the silhouettetracking approach.

This section will contain a thorough review of the tracking methods point tracking, kerneltracking, and silhouette tracking as well as the subcategories which can be seen in figure 5.

16(37)

Figure 5: A common categorization of object tracking techniques [48].

4.4.1 Point tracking

The point tracking approach is used when detected objects are represented as points. Gen-erally speaking, tracking is performed by evaluating their state in terms of position andmotion [30].

Tracking is made possible by associating points across frames. Association in one frameto another is based on the previous object state. Generally speaking, point correspondenceis a complicated problem. This is especially true in the case of increased complexity, forinstance occlusion, misdetections, and entries and exit of objects. As one can see in figure5, the point tracking approach can be further divided into two subcategories, deterministicand statistical methods. The big difference between these two groups of correspondencemethods is how they solve the problem of minimizing the correspondence cost [48].

The process of matching every object in frame t −1 to a specific object in frame t is calledas a correspondence cost. The constraints listed below are usually used in combinations tolower the correspondence cost. An illustration of the constraints can be seen in figure 6.

• Proximity: Assumes that an object’s location does not change drastically from oneframe to another. (Figure 6(a)).

• Maximum velocity: The velocity of an object is constrained within an upper bound.(Figure 6(b))

• Small velocity change: Assumes that an object’s velocity does not change drasticallyfrom one frame to another. (Figure 6(c))

• Common motion constraints: The velocity of the objects in a small neighbourhoodare constrained to be similar. (Figure 6(d))

• Rigidity: The object is assumed to be rigid which means that the distance betweenany two points on the object will remain the same. (Figure 6(e))

• Proximal uniformity: A combination of constraints for proximity and small velocitychange [48].

17(37)

Figure 6: Motion constraints [48]

Deterministic method

The development of future states in a system that does not involve any randomness is calleda deterministic system. This means that using a given start condition or initial state in adeterministic model will always produce the same output. In the context of object track-ing the deterministic approach means that object movements are assumed to follow sometrajectory prototypes. There are several different approaches in creating a trajectory pro-totype [6]. One method when tracking people is to create the prototype offline, by usingground-truth data. The result can be seen in figure 7. The prototype is based on four rulesthat constraint a pedestrians path [34]. This method could be problematic if the used rulesare incorrect for the objects behaviour in the scenery, for instance by complex movements,obstacles changing the path, or change in velocity.

Figure 7: The result of creating a prototype offline, by using ground-truth data. The blacklines shows the pedestrians path, the red lines shows the predicted paths [34]

1. The difference of a persons location in two consecutive frames is not too large.

2. The speed and direction of people is the same in all frames.

3. The movement of people should bring them to their destination.

4. The movement of people aim to avoid other people in the scene.

The deterministic approach defines the correspondence cost as a combinatorial optimiza-tion problem, where one solution is to use greedy search methods to obtain a one-to-onecorrespondence, see figure 8 [48].

18(37)

Figure 8: a) all possible correspondences between points b) one-to-one correspondence[48]

Statistical method

Uncertainty factors like noise in measurements and random perturbations in motion arealmost impossible to avoid completely. The statistical method for correspondence takethese problems and uncertainties into account when estimating the object state which makesit robust. It is however considered more complex to model than the deterministic method[48]. Shortly stated, this method rely on the probability of object movement [6].

4.4.2 Kernel tracking

The kernel tracking approach is based on computing the motion of an object representedby a primitive object region. By using a motion model, i.e. computing the motion of anobject from one frame to another, it is possible to determine its next position. Dependingon the purpose of the tracking, different parts of the estimation is important. For instancewhen using the trajectory to analyze the object behaviour only motion is needed. However,the region enclosing the object also becomes important when identification of the object isneeded [26]. Kernel, in this context, is referring to the look of the object, i.e. the shapeand appearance. Different primitive shapes like a rectangle or ellipse templates are used torepresent the object [28]. As seen in figure 5 the kernel tracking method is divided into twogroups connected to multi-view based appearance models and template based appearancemodels mentioned in section 4.1.2. This trackers in this category are in other words based onthe appearance representations used. The template based appearance models are sometimescalled single-view based approaches.

19(37)

Figure 9: Tracking objects using simple geometric shapes [40]

Template based

Template based appearance models are considered relatively straight-forward to use and notas complex as other models. This has made them a popular choice in appearance model andhave been widely used for long. With the template based approach the methods for trackingobjects differ depending on if it is a single object or multiple objects being tracked. Theseare therefore divided into two different cases.

Single object tracking

When tracking single objects the most common approach is template matching. Differentfeatures like color or image intensity can be used to form the template [48]. The basicapproach for this method is to search for a specific template pattern in an image, hencematching the template to a specific part of the image. The object template is generatedfrom the previous frame. This is a method which is widely used due to its flexibility andstraightforward approach. However, it can be time-consuming for complex templates sinceit is a brute-force method [7, 43]. When using this method, the search area in the nextframe is usually limited based on the objects position in the current frame [48]. Also, moreefficient template matching algorithms has been suggested the last couple of years. Foran image size n and an template size k the number of operations will be in the order ofkn for a straight forward template matching implementation. Researchers have developedalgorithms that perform the task in n operations [33].

Multiple object tracking

Tracking multiple objects makes for a more complex task, where factors like interaction be-tween objects and background must be taken into account. One type of interaction is whenone object partly or completely occludes another object. Modeling objects individually forsingle object tracking does not consider these difficulties [48]. Different methods has beensuggested to solve the problem. One suggestion is to consider the image as a set of layers,where the number of objects being tracked determines the number of layers in the image.The method does also include an additional background layer. Each layer contains toolsand models like layer appearance and a motion model corresponding to the object beingrepresented. The background layer is used to compensate any background motion so thatan objects motion can be calculated from the compensated image [38]. This way occlusioncan be explicitly handled. Another suggested method is to use Bayesian decision theory

20(37)

to track movements and detect occlusion. This is done by using color intensity and colorhistograms as feature representation and a similarity score for each detected pair. If thematching score is higher than a certain selected threshold value the pair is considered tobe a match, and the templates used for tracking are updated. If the score is lower than thethreshold value the object is further investigated to see if occlusion has occurred. The objectis divided into several subparts and the similarity score is calculated for these parts. If oneobject part scores high enough while other scores low an occlusion is detected and the pairis still considered to be a match, although the template is not updated [49]. This methoddoes not work if a object is completely occluded.

Multi-view based

A problem with generating, for instance, a template to represent the object to be tracked isthat the representation is usually based on the latest observation of the object. This meansthat the representation only considers the object from the current visible view, in other wordsfrom one view only. This can be a problem with more complex objects that may appeardifferent from different views. This in turn means that if the tracked object undergoes adrastic change in appearance or motion from one frame to another the information is nolonger relevant and the tracking may be lost. The problem can also occur with occlusionand objects temporarily exiting the frame. The are methods to learn different views of theobject and thereby overcoming the problem [48]. One way to do this is using a subspace-based approach where the eigenspace is used to compute the transformation from an objectimage to a reconstruction of the image using eigenvectors [4].

4.4.3 Silhouette tracking

This method is occasionally called region tracking, or region-based tracking. Representingobjects with simple geometric shapes can be inadequate for some, more complex, objectslike hands or heads. By representing an object like a human with simple shapes like cylin-drical or skeleton models the tracking may be insufficient, due to bad object representation.With the silhouette method it is possible to get an accurate shape description [31]. Theobject models, which can be in the form of color histograms, object edge or object contour,that are used in the tracking are created using the previous frames. The goal is then to find aregion in the current frame that matches the model. Like the other, this category is dividedinto two subcategories, shape matching and contour evolution [48].

21(37)

Figure 10: Tracking objects using object silhouettes [21]

Shape matching

The basic approach of shape matching is to use some similarity measure to measure resem-blance between two shapes, which means that similarity measure methods is an essentialpart of this method. The algorithm used for this purpose depends the required propertiesand the specific matching problem [41].

This method can be performed similar to the template based approach covered in section4.4.2. A model is generated based on the object silhouette in the previous frame and usedto compute the similarity of the object in the current frame. In this approach, the objectmodel is often in a form of color or edge histograms, object edges, or a combination of thesemodels. To handle tracking difficulties like non rigid object motion, changes of illumination,or viewpoint changes the object model is updated every frame after detection. Anotherapproach is to look at two consecutive frames and match shapes by finding correspondingsilhouettes, called silhouette matching. Point tracking, which is discussed in section 4.4.1,are in some ways similar to establishing silhouette matching. The main difference betweenthe two is how the objects are represented and which object model is used. The pointtracking method only make use of features connected to motion and position, in contrastto silhouette matching which also make use of appearance features since then whole objectregion is used [48].

Contour evolution

Contour tracking methods, occasionally called boundary tracking, uses the contour fromthe previous frame as an initial contour in the current frame, to evolve a new contour forthe current object position. The method uses edge-based features which are insensitive toillumination changes, which makes it robust. It is also faster than shape matching since thearea of boundaries are less than including the whole object region [13]. For this methodto work it is required that at least some part of the object region in the previous frame isoverlapped by the object region in the current frame [48].

22(37)

23(37)

Chapter 5

Result

This section contains summaries of the point tracking approach, the kernel tracking ap-proach and the silhouette tracking approach. A short method summary is provided for eachcategory and subcategory. A recommendation of usage area as well as advantages anddisadvantages of the technique in question is also provided. Each of these techniques arefollowed by a summary of the covered subcategories including usage area, advantages, anddisadvantages.

Advantages and disadvantages for each category and subcategory is also presented as a tablein appendix A for a better overview.

Point tracking

The point tracking approach is used when detected objects are represented as points. Track-ing is performed by evaluating the objects state in terms of position and motion and byassociating points across frames.

• When used: When detected objects are represented as points.

• Suitable for: Objects that occupy a small region in the video, preferably small enoughto be represented by a single point.

• Advantages: Not sensitive to illumination changes.

• Disadvantages: Requires external mechanism to detect object in every frame. Prob-lem areas in handling occlusion (especially for objects represented by multiple points),misdetection, entries and exits of objects. Hard to distinguish between multiple ob-jects and object/background.

24(37)

Deterministic method

• Suitable for: Objects which paths can be anticipated using a trajectory prototype.Humans are for instance more suitable to track with this method than animals, be-cause of animals unpredictable pattern in movement which makes it harder to createprototypes.

• Advantages: Deterministic methods are usually scalable since they can always be runin parallel.

• Disadvantages: Since the objects are assumed to follow some specific trajectory pro-totype there will arise problems if the rules underlying the prototype are wrong, forinstance by complex movements. The tracking might be lost in a scenario like this.

Statistical method

• Suitable for: When having to deal with noise in measurement or uncertainties likerandom perturbations in motion.

• Advantages: By taking any uncertainties, problems or noise in measurements intoaccount when estimating the object state it becomes a reliable way of predicting anobjects path.

• Disadvantages: Considered more complex to model than the deterministic method.

Kernel tracking

The kernel tracking approach is based on computing the motion of an object. By comput-ing the motion of an object from one frame to another, it is possible to determine its nextposition.

• When used: When primitive shapes like rectangle or ellipse templates are used torepresent the object.

• Suitable for: Simple shaped, rigid objects.

• Advantages: Object detection is only needed when the object first appears in thescene.

• Disadvantages: Due to primitive geometric shape representation, parts of the objectsmay be left outside of the defined shape while parts of the background may resideinside it. This problem can occur for both rigid as well as non rigid objects. Usuallydoes not handle occlusion explicitly.

Template based

• Suitable for: Objects that does not undergo a change in appearance when viewed fromdifferent angles, or a drastic change in motion pattern from one frame to another.

• Advantages: Considered relatively easy and straight-forward to use.

25(37)

• Disadvantages: Can be time consuming for complex templates. Can occur problemswhen objects are occluded, or temporarily exiting the scene.

Multi-view based

• Suitable for: Objects that are not appropriate to represent from one view only, due toe.g. motion or appearance changes.

• Advantages: Flexible due to the ability to handle more complex object appearancesand motion changes.

• Disadvantages: More complex than the template based method.

Silhouette tracking

The silhouette tracking approach is used when object representation with simple geometricforms or dots are inadequate. Object models are created using the previous frame, and thegoal is to find a matching region in the current frame.

• When used: When the contour or silhouette of an object is used for object represen-tation.

• Suitable for: Complex shaped and/or non rigid objects.

• Advantages: Flexible in object representation, can handle a large variety of objectshapes, including complex non rigid shapes. Object detection is only needed whenthe object first appears in the scene.

• Disadvantages: Usually does not handle occlusion explicitly.

Shape matching

• Suitable for: Objects where representation of the whole object region is needed, forinstance when using color as feature selection.

• Advantages: Possible to combine with object recognition since the whole object re-gion is used.

• Disadvantages: Can be time consuming.

Contour evolution

• Suitable for: Objects where it does suffice to use object contour as object representa-tion, for instance when only motion patterns are needed, and not object recognition.

• Advantages: Insensitive to illumination changes, robust. Faster than shape matchingsince it does not use the whole object region.

• Disadvantages: Does not work when objects are completely occluded.

26(37)

27(37)

Chapter 6

Discussion

While making this study it has been clear that there is some consensus regarding what cat-egories of object tracking techniques that exists today. Most of the reviewed papers thathave had some relevance to this study has spoken of object tracking techniques using thesame categorisation covered in this report. This have made it easier to find similarities be-tween papers and to make a summary of the techniques existing today. On the other hand itmight be possible that the researchers have influenced each other heavily, and thereby disre-garded other techniques available, or not presenting any new perspectives on the commonlyoccurring techniques.

A lot of literature exists regarding different object tracking algorithms, their advantages,disadvantages, and differences. The aim of this study was however to make a comprehen-sible, understandable summary of different techniques on a high level abstraction withoutinvolving specific algorithms. Finding information regarding the different object trackingtechniques, or categories, covered in this study has not been as easy, thus making it a chal-lenge to make a summary and compare them. It is also important to remember that the resultis a generalization of the techniques, bringing forward the most suitable environments andmost common problems. However, it does exist research in each category with suggestionsof algorithms containing work arounds for most of the problems and difficulties presented.The accuracy of suggested algorithms does, of course, vary.

The table presented in appendix A provides an overview of the core of the result. It iseasy to see that there are both advantages as well as disadvantages with all three of thepoint tracking, kernel tracking, and silhouette tracking method. However, the point trackingapproach and the kernel tracking approach holds more disadvantages than the silhouettetracking approach, which in turn holds more advantages than the other two techniques.This is not surprising, and it does not mean that the silhouette tracking method are superiorin all situations. It is, generally speaking, a more complex approach which goes in line withthe flexibility and the broad area of application. This is further elaborated in chapter 7.

The result shows that occlusion is a recurring problem, and seems to be one of the mostchallenging problems in the area of object tracking. It is therefore an important subject anda reason for researchers to keep developing more efficient and exact algorithms.

Occlusion can occur in several situations. Self occlusion is when some part of the objectoccludes another part. The problem often arises when using articulated objects, for instancewhen representing a person. Another type of occlusion is when two, or more, tracked ob-

28(37)

jects occlude each other. This is called interobject occlusion. Similarly, objects in thebackground can occlude tracked objects, which is simply called occlusion by the back-ground [48]. There are different approaches for handling this kind of disturbance such asassuming constant object velocity, which makes it possible to calculate an objects new posi-tion even though the tracking might be lost in some frame due to occlusion. However, sincedifferent object tracking algorithms vary extremely even though they are based on the sameconcept, hence being part of the same object tracking category, it is impossible to determineexactly how each method covered in this study should handle the case of occlusion. Thisneeds to be discussed and handled on a case by case basis.

As stated in the result, occlusion in the point tracking approach can be problematic fortracking multiple, large objects that are represented using multiple points since it can be hardto determine which points that belong to a certain tracked object, and which comes from anoccluding object. One can also argue that partial or complete occlusion can be problematicwhen tracking non rigid objects, since it can be difficult to differentiate between the objectchanging its shape and the object getting occluded. Generally speaking, by knowing that theobject shape and size stays the same during the entire tracking process (i.e. rigid objects),together with limitations such as constant velocity should reduce the complexity in handlingobject occlusion.

29(37)

Chapter 7

Conclusions

The result suggests that the most common categories of object tracking techniques are suit-able for different applications, or environments. It is not possible to select one of thesemethods that would be superior to the others in all scenarios. One could argue that thesilhouette tracking approach is the most comprehensive method since it can handle a largevariety of objects and does not have restrictions in representing object shapes as the othermethods. On the other hand, it may be unduly complex in situations where simple, rigidobjects are used, which could possibly be represented using only a dot or a geometric shapesuch as a rectangle. It goes without saying that creating an object representation using a sim-ple geometric shape, for instance a rectangle where only object width and height is needed,is less complex than creating a silhouette using the exact object outline. On the other hand,an exact outline may be needed in other situations to make the object distinguishable.

7.1 Future work

Some steps in the object tracking process are mostly done manually, feature selection is oneexample. The accuracy of object tracking could potentially increase by developing methodsfor a more automatic selection process of features. We know from experience that a humantends do make more mistake than a computer program optimized for a certain purpose.

Automatic feature selection has received attention in the area of pattern recognition, wheremethods for this purpose are divided into filter methods and wrapper methods [48]. How-ever, these have not gotten the same attention in the area of object tracking, where featureselection still is mostly done manually. There could be room for improvement in objecttracking by developing fast and accurate methods for automatic feature selection.

A suitable continuation of a work like this thesis would be to make an easy, comprehensiblesummary over the most common object tracking algorithms, thus making an extension tothis work.

30(37)

7.2 Concluding remarks

This thesis is the result of a thorough literature study over the most common techniques inthe area of object tracking. This includes the need of representing the object in a way thatmakes sense to the computer and ways to achieve this, methods for detecting the object ofinterest, and methods for tracking the objects over succeeding frames in a video. The meth-ods are divided into three categories based on the expected object representation, namelymethods establishing point correspondence, methods using primitive geometric models, andmethods using contour evolution. These are further divided into subgroups based on algo-rithm approaches. The thesis provides a summary of the most common object trackingmethods which hopefully can give valuable insight into this topic.

31(37)

References

[1] Literature review. http://linguistics.byu.edu/faculty/henrichsen/ResearchMethods/RM_3_03.html. [Online; accessed 02-December-2016].

[2] J Joshan Athanesious and P Suresh. Systematic survey on object tracking methodsin video. International Journal of Advanced Research in Computer Engineering &Technology (IJARCET) October, pages 242–247, 2012.

[3] Thaddeus Beier and Shawn Neely. Feature-based image metamorphosis. In ACMSIGGRAPH Computer Graphics, volume 26, pages 35–42. ACM, 1992.

[4] Michael J Black and Allan D Jepson. Eigentracking: Robust matching and track-ing of articulated objects using a view-based representation. International Journal ofComputer Vision, 26(1):63–84, 1998.

[5] John Canny. A computational approach to edge detection. IEEE Transactions onpattern analysis and machine intelligence, (6):679–698, 1986.

[6] Duc Phu Chau, Francois Bremond, and Monique Thonnat. Object tracking in videos:Approaches and issues. arXiv preprint arXiv:1304.5212, 2013.

[7] Greg S Cox. Template matching and measures of match in image processing. Univer-sity of Cape Town, South Africa, 1995.

[8] Barga Deori and Dalton Meitei Thounaojam. A survey on moving object tracking invideo. International Journal on Information Theory (IJIT), 3(3):31–46, 2014.

[9] Educause. Image processing : Morphing. http://www.owlnet.rice.edu/

˜elec539/Projects97/morphjrks/morph.html, 1997. [Online; accessed 5-December-2017].

[10] Dresden International Graduate School for Biomedicine and Bioengineer. Basicsof quantitative image analysis. http://www.pasteur.gr/wp-content/uploads/Basics-of-Quantitative-image-analysis.pdf. [Online; accessed 11-Januari-2017].

[11] Investopedia. Posterior probability. http://www.investopedia.com/terms/p/posterior-probability.asp. [Online; accessed 16-December-2016].

[12] Investopedia. Prior probability. http://www.investopedia.com/terms/p/prior_probability.asp. [Online; accessed 16-December-2016].

[13] Ann Maria Jacob and J Anitha. Inspection of various object tracking techniques. In-ternational Journal of Engineering and Innovative Technology, 2(6):118–124, 2012.

32(37)

[14] G Jemilda, S Baulkani, D George Paul, and J Benjamin Rajan. Tracking movingobjects in video. JOURNAL OF COMPUTERS, 12(3):221–229, 2017.

[15] Kinjal A Joshi and Darshak G Thakore. A survey on moving object detection andtracking in video surveillance system. International Journal of Soft Computing andEngineering, 2(3):44–48, 2012.

[16] Michael Kass, Andrew Witkin, and Demetri Terzopoulos. Snakes: Active contourmodels. International journal of computer vision, 1(4):321–331, 1988.

[17] Kenneth I Laws. Textured image segmentation. Technical report, DTIC Document,1980.

[18] LessWrongWiki. Bayesian decision theory. https://wiki.lesswrong.com/wiki/Bayesian_decision_theory. [Online; accessed 4-January-2017].

[19] Hui Li and Yanjiang Wang. Object of interest tracking based on visual saliency andfeature points matching. In 6th International Conference on Wireless, Mobile andMulti-Media (ICWMMN 2015), pages 201–205. IET, 2015.

[20] Tony Lindeberg. Scale selection properties of generalized scale-space interest pointdetectors. Journal of Mathematical Imaging and Vision, 46(2):177–210, 2013.

[21] Bioprobes Ltd. http://www.bioprobeshk.com/catalog/products.php?cPath=37. [Online; accessed 15-January-2017].

[22] Shipra Ojha and Sachin Sakhare. Image processing techniques for object trackingin video surveillance-a survey. In Pervasive Computing (ICPC), 2015 InternationalConference on, pages 1–6. IEEE, 2015.

[23] Bruno A Olshausen. Bayesian probability theory. http://redwood.berkeley.edu/bruno/npb163/bayes.pdf, 2004. [Online; accessed 16-December-2016].

[24] Jiyan Pan and Bo Hu. Robust occlusion handling in object tracking. In 2007 IEEEConference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007.

[25] Seamour Papert. The summer vision project. https://dspace.mit.edu/bitstream/handle/1721.1/6125/AIM-100.pdf?sequence=2, 1966. [Online; ac-cessed 18-November-2016].

[26] J. Meja Patel and Bhumika Bhatt. A comparative study of object tracking techniques.International Journal of Innovative Research in Science, Engineering and Technology,4(3), 2015.

[27] Sandeep Kumar Patel and Agya Mishra. Moving object tracking techniques: A criticalreview. Indian Journal of Computer Science and Engineering, 4(2):95–102, 2013.

[28] R. Hemangi Patil and K. S. Bhagat. Detection and tracking of moving object: Asurvey. Int. Journal of Engineering Research and Applications, 5(11):138–142, 2015.

[29] Massimo Piccardi. Background subtraction techniques: a review. In Systems, manand cybernetics, 2004 IEEE international conference on, volume 4, pages 3099–3104.IEEE, 2004.

33(37)

[30] Mael Primet and Lionel Moisan. Point tracking: an a-contrario approach. 2012.

[31] Bodo Rosenhahn, Uwe Kersting, Smith Andrew, Thomas Brox, Reinhard Klette, andHans-Peter Seidel. A silhouette based human motion tracking system. Technicalreport, CITR, The University of Auckland, New Zealand, 2005.

[32] Baidya Nath Saha, Nilanjan Ray, and Hong Zhang. Automating snakes for multipleobjects detection. In Asian Conference on Computer Vision, pages 39–51. Springer,2010.

[33] Haim Schweitzer, JW Bell, and Feng Wu. Very fast template matching. In EuropeanConference on Computer Vision, pages 358–372. Springer, 2002.

[34] Paul Scovanner and Marshall F Tappen. Learning pedestrian dynamics from the realworld. In ICCV, volume 9, pages 381–388, 2009.

[35] Grandham. Sindhuja and Renuka. Devi. A survey on detection and tracking of ob-jects in video sequence. International Journal of Engineering Research and GeneralScience, 3(2), 2015.

[36] Min Sun and Krstic Srdjan. Optical flow. http://www.cs.princeton.edu/courses/archive/fall08/cos429/optiflow.pdf, 2016. [Online; accessed 06-December-2016].

[37] Richard Szeliski. Computer Vision: Algorithms and Applications. Springer, 2010.

[38] Hai Tao, Harpreet S Sawhney, and Rakesh Kumar. Object tracking with bayesianestimation of dynamic layer representations. IEEE transactions on pattern analysisand machine intelligence, 24(1):75–89, 2002.

[39] Pennsylvania State University. Probability density functions. https://onlinecourses.science.psu.edu/stat414/node/97, 2016. [Online; accessed06-December-2016].

[40] Mahadevan V and Vasconcelos N. Saliency based discriminant tracking. http://www.svcl.ucsd.edu/projects/tracking/. [Online; accessed 2-January-2017].

[41] Remco C Veltkamp. Shape matching: similarity measures and algorithms. In ShapeModeling and Applications, SMI 2001 International Conference on., pages 188–197.IEEE, 2001.

[42] David Vernon. Machine vision. Pearson Education Limited, 1991.

[43] Adaptive Vision. Template matching. http://docs.adaptive-vision.com/current/studio/machine_vision_guide/TemplateMatching.html. [Online;accessed 16-November-2016].

[44] Ross Whitaker. Graph cuts approach to the problems of image segmentation. http://www.coe.utah.edu/˜cs7640/readings/graph_cuts_intro.pdf. [Online; ac-cessed 16-December-2016].

[45] Wikipedia. Active contour model — wikipedia, the free encyclopedia.https://en.wikipedia.org/w/index.php?title=Active_contour_model&oldid=756446993, 2016. [Online; accessed 5-Januari-2017].

34(37)

[46] Wikipedia. Interpolation (computer graphics) — wikipedia, the free ency-clopedia. https://en.wikipedia.org/w/index.php?title=Interpolation_(computer_graphics)&oldid=715584841, 2016. [Online; accessed 18-December-2016].

[47] Andrew Willis and Yunfeng Sui. An algebraic model for fast corner detection. In 2009IEEE 12th International Conference on Computer Vision, pages 2296–2302. IEEE,2009.

[48] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: A survey. Acmcomputing surveys (CSUR), 38(4):13, 2006.

[49] Yan Zhou, Bo Hu, and Jianqiu Zhang. Occlusion detection and tracking method basedon bayesian decision theory. In Pacific-Rim Symposium on Image and Video Technol-ogy, pages 474–482. Springer, 2006.

35(37)

Appendix A

Summary table of result

Method Advantages DisadvantagesPoint tracking Not sensitive to illumination

changesRequires external mechanism todetect object in every frame.Problem areas in handling oc-clusion (especially for objectsrepresented by multiple points),misdetection, entries and exits ofobjects. Hard to distinguish be-tween multiple objects and ob-ject/background

Kernel tracking Object detection is only neededwhen the object first appears inthe scene

Due to primitive geometricshape representation, parts ofthe objects may be left outsideof the defined shape while partsof the background may resideinside it. This problem canoccur for both rigid as well asnon rigid objects. Usually doesnot handle occlusion explicitly

Silhouette tracking Flexible in object representation,can handle a large variety of ob-ject shapes, including complexnon rigid shapes. Object detec-tion is only needed when the ob-ject first appears in the scene

Usually does not handle occlu-sion explicitly

36(37)

Method Advantages DisadvantagesDeterministic (Point tracking) Deterministic methods are usu-

ally scalable since they can al-ways be run in parallel

Since the objects are assumedto follow some specific trajec-tory prototype there will ariseproblems if the rules underlyingthe prototype are wrong, for in-stance by complex movements.The tracking might be lost in ascenario like this

Statistical (Point tracking) By taking any uncertainties,problems or noise in measure-ments into account when esti-mating the object state it be-comes a reliable way of predict-ing an objects path

Considered more complex tomodel than the deterministicmethod

Template based (Kernel tracking) Considered relatively easy andstraight-forward to use

Can be time consuming for com-plex templates. Can occur prob-lems when objects are occluded,or temporarily exiting the scene

Multi-view based (Kernel tracking) Objects that are not appropriateto represent from one view only,due to e.g. motion or appearancechanges

More complex than the templatebased method

Shape matching (Silhouette tracking) Flexible in object representation,can handle a large variety of ob-ject shapes, including complexnon rigid shapes. Object detec-tion is only needed when the ob-ject first appears in the scene

Can be time consuming

Contour evolution (Silhouette tracking) Insensitive to illuminationchanges, robust. Faster thanshape matching since it does notuse the whole object region

Does not work when objects arecompletely occluded

37(37)

Appendix B

List of concepts

• Bayesian probability: When reasoning and making decisions under uncertainties theBayesian probability provides one of the most important mathematical frameworks.It is used for performing inference, or reasoning, using probability [23].

• Bayesian decision theory: A decision theory which is informed by Bayesian prob-ability. The tradeoffs between various decisions is attempted to be quantified usingprobabilities and costs [18].

• Prior probability: Prior probability is a concept within Bayesian statistics which ex-press one’s beliefs about a quantity before evidence is taken into account. The proba-bility is updated when new information is provided. It is in other words the probabilityof a certain outcome of an event, before it has occurred [12].

• Posterior probability: Is the updated probability of an event occurring after new infor-mation is taken into account. Is usually calculated by updating the prior probability.Given that event A has occurred, the posterior probability describes the probability ofevent B occurring [11].

• Graph cuts: Describes the event of partitioning a graph into two disjoint subsets. Isused to solve a wide variety of low-level problems within the area of computer vision.Image smoothing is one example [44].

• Image morphing: The process of transforming one image to another by using a se-quence of intermediate images. The images will, when put together with the originalimage, represent the transition from the original image to another [9].

• View interpolation: A method used in the area of computer animation for drawing im-ages semi-automatically. Polynomial interpolation is usually used to calculate framesbetween key frames to accomplish this [46].

• Quantitative image analysis: The process of collecting quantitative data from an im-age, for instance maximum or minimum value of color intensity [10].

object tracking methods and their areas of application: a ... · object tracking methods and their...

Documents