countering bias in tracking evaluations - diva portal

8
Countering bias in tracking evaluations. Gustav Häger, Michael Felsber and Fahad Kahn Conference article Cite this conference article as: Häger, G. Countering bias in tracking evaluations., In Francisco Imai, Alain Tremeau and Jose Braz (eds), Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 2018, Volume 5, pp. 581–587. ISBN: 978-989-758-290-5 DOI: https://doi.org/10.5220/0006714805810587 Copyright: The self-archived postprint version of this conference article is available at Linköping University Institutional Repository (DiVA): http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-151306

Upload: others

Post on 02-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Countering bias in tracking evaluations - DiVA portal

 

 

Countering bias in tracking evaluations.

Gustav Häger, Michael Felsber and Fahad Kahn Conference article Cite this conference article as: Häger, G. Countering bias in tracking evaluations., In Francisco Imai, Alain Tremeau and Jose Braz (eds), Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 2018, Volume 5, pp. 581–587. ISBN: 978-989-758-290-5 DOI: https://doi.org/10.5220/0006714805810587

Copyright: The self-archived postprint version of this conference article is available at Linköping University Institutional Repository (DiVA): http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-151306

Page 2: Countering bias in tracking evaluations - DiVA portal

Countering bias in tracking evaluations.

Gustav Hager, Michael Felsberg and Fahad KhanComputer vision lab, Linkoping University

[firstname].[lastname]@liu.se

Keywords: Tracking, Evaluation

Abstract: Recent years have witnessed a significant leap in visual object tracking performance mainly due to powerfulfeatures, sophisticated learning methods and the introduction of benchmark datasets. Despite this significantimprovement, the evaluation of state-of-the-art object trackers still relies on the classical intersection overunion (IoU) score. In this work, we argue that the object tracking evaluations based on classical IoU score aresub-optimal. As our first contribution, we theoretically prove that the IoU score is biased in the case of largetarget objects and favors over-estimated target prediction sizes. As our second contribution, we propose a newscore that is unbiased with respect to target prediction size. We systematically evaluate our proposed approachon benchmark tracking data with variations in relative target size. Our empirical results clearly suggest thatthe proposed score is unbiased in general.

1 Introduction

A significant progress has been made in challeng-ing computer vision problems, including object de-tection and tracking during the last few years (Kris-tan et al., 2016), (Russakovsky et al., 2015). In ob-ject detection, the task is to simultaneously classifyand localize an object category instance in an imagewhereas visual tracking is the task of estimating thetrajectory and size of a target in a video. Generally,the evaluation methodologies employed to validatethe performance of both object detectors and trackersare based on the intersection over union score (IoU).The IoU provides an overlap score for comparing theoutputs of detection/tracking methods with the givenannotated ground-truth. Despite its widespread use,little research has been done on the implications ofIoU score during object detection and tracking per-formance evaluations.

Recent years have seen a significant boost intracking performance both in terms of accuracy androbustness. This significant jump in tracking per-formance is mainly attributed to the introduction ofbenchmark datasets, including the visual object track-ing (VOT) benchmark (Kristan et al., 2016). In theVOT benchmark, object trackers are ranked accord-ing to their accuracy and robustness. The accuracy isderived from the IoU score (Jaccard, 1912),(Evering-ham et al., 2008), while the robustness is related tohow often a particular tracker loses the object. Differ-

ent to VOT, the online tracking benchmark (Wu et al.,2015) (OTB) only takes accuracy into account byagain using evaluation methodologies based on IoUcriteria. Both the VOT and OTB benchmarks containtarget objects of sizes ranging from less than one per-cent to approximately 15% of the total image area.The lack of larger objects in object tracking bench-marks is surprising, as it directly corresponds to situ-ations where the tracked object is close to the camera.However, in such cases, the de facto tracking evalu-ation criteria based on IoU score will be sub-optimaldue to its bias towards over-estimated size predictionof targets. In this work, we theoretically show thatthe standard IoU is biased since it only considers theground-truth and target prediction area, while ignor-ing the remaining image area (see figure 1).

When dealing with large size target objects, anaive strategy is to over-estimate the target size bysimply outputting the entire image area as a predictedtarget region (see figure 1). Ideally, such a naive strat-egy is expected to be penalized by the standard track-ing evaluation measure based on the IoU score. Sur-prisingly, this is not the case (Felsberg et al., 2016).The IoU based standard evaluation methodology failsto significantly penalize such an over-estimated targetprediction case, thereby highlighting the bias withinthe IoU score.

In this paper, we provide a theoretical proof thatthe standard IoU score is biased in case of large targetobjects. To counter this problem, we propose an unbi-

Page 3: Countering bias in tracking evaluations - DiVA portal

Figure 1: An example image, where the target (red) coversa large area of the image. The tracker outputs a target pre-diction (blue) covering the entire image. The standard IoUscore will be equal to the ratio between the size of the targetand the total image size, assigning an overlap score of 0.36.Our proposed unbiased score assigns a significantly lowerscore of 0.11, as it penalizes the severe over-estimation ofthe target size.

ased approach that accounts for the total image area.Our new score is symmetric with respect to errors intarget prediction size. We further validate our pro-posed score with a series of systematic experimentssimulating a wide range of target sizes in tracking sce-narios. Our results clearly demonstrate that the pro-posed score is unbiased and is more reliable than thestandard IoU score when performing tracking evalua-tions on videos with a wide range of target sizes.

2 Related work

A significant leap in performance has been wit-nessed in recent years for both object detection andtracking (Kristan et al., 2014),(Everingham et al.,2008). Among other factors, this dramatic improve-ment in tracking detection performance is attributedto the availability of benchmark datasets (Kristanet al., 2014),(Russakovsky et al., 2015). These bench-mark datasets enable the construction of new meth-ods by providing a mechanism for systematic perfor-mance evaluation with existing approaches. There-fore, it is imperative to have a robust and accurateperformance evaluation score that is consistent overdifferent scenarios. Within the areas of object de-tection and tracking (Everingham et al., 2008), (Wuet al., 2013),(Russakovsky et al., 2015), standard eval-uation methodologies are based on the classical in-tersection over union (IoU) score. The IoU score,also known as Jaccard overlap (Jaccard, 1912), takes

into account both the intersection and the union be-tween the ground-truth and target prediction. Thescore compares the distance between a pair of binaryfeature vectors. Despite its widespread use, the IoUscore struggles with large size target objects.

Other than the IoU score, the F1 score is com-monly employed in medical imaging and text process-ing. The F1 score is computed as the geometric meanof the precision and recall scores and can be viewedas analogous to the IoU score. However, a drawbackof F1 score measure is its inability to deal with highlyskewed datasets, as it does not sufficiently account thetrue negatives obtained during the evaluation (Powers,2011). Most tracking benchmarks are highly skewed,as they contain significantly more pixels annotated asbackground than the target object. Surprisingly, thisproblem has not been investigated in the context ofobject detection and visual tracking.

In the context of object detection, the issue ofskewed data is less apparent since the overlap betweenthe target prediction and ground-truth is only requiredto be greater than a certain threshold (typically 0.5).Different to object detection, the evaluation criteriain object tracking is not limited to a single thresholdvalue. Instead, the tracking performance is evaluatedover a range of different threshold values. Further,the final tracking accuracy is computed as an aver-age overlap between target prediction and the ground-truth over all frames in a video. In this work, we in-vestigate the consequences of employing IoU metricin visual object tracking performance evaluation.

3 Overlap scores

Here we analyze the traditional intersection overunion (IoU) O. We prove that it is biased with re-gard to the prediction size. The reasoning is built onthe notion of four different regions of the image givenby the annotation bounding box, Sa, and the detectionbounding box, Sd . These areas are: Ada = |Sa ∩ Sd |(true positive), Ada = |Sd ∩ Sa| (false positive), Ada =|Sd ∩ Sa| (false negative), and Ada = |Sd ∩ Sa| (truenegative).

3.1 Bias analysis for intersection overunion

The classical IoU, O measures the overlap as the ratioof the intersection area between detection and groundtruth, and union area:

O =Aad

Aad +Aad +Aad(1)

Page 4: Countering bias in tracking evaluations - DiVA portal

Figure 2: One dimensional image (black) with the annota-tion (red) and the bounding box (green). Image coordinatesrange from 0 to 1.

While the IoU behaves satisfactorily for objectsof smaller size, it does not function well when objectsare larger enough to cover a significant portion of theimage. For an annotation of covering most of the im-age area it is possible for the tracker to set the pre-diction size to cover the full image, and still maintaingood overlap.

We will now show that the IoU is biased with re-spect to the size of the prediction. To simplify thederivation, we consider without loss of generality aone dimensional image. As the IoU only considers ar-eas of prediction and ground truth, extending the rea-soning to two dimensional images can be done triv-ially. A visualization of the one dimensional image,with the annotated bounding interval in red and thedetection interval in green can be seen in figure 2.

In one dimension the annotation is an intervalstarting at As, and ending in Ae. The prediction inter-val starts at Ds and ends at De. A small perturbationof Ds or De, will change the overlap interval Iad , orthe false positive interval Iad respectively. As we willnow show the IoU is not the optimal choice for over-lap comparison as it does not treat errors in positionand size equally.

We assume that the overlap is non-empty, i.e.,Ds < Ae ∧De > As. We then get four different casesof imperfect alignment (Ia ∩ Id = Iad and Ia ∪ Id =Iad + Iad + Iad ; figure 2 shows case 4.):

case boundaries Ia∩ Id Ia∪ Id1. Ds < As∧De > Ae Ae−As De−Ds2. Ds < As∧De < Ae De−As Ae−Ds3. Ds > As∧De < Ae De−Ds Ae−As4. Ds > As∧De > Ae Ae−Ds De−As

By considering a change in position of the bound-ing box εp, [Ds;De] 7→ [Ds + εp;De + εp], and a smallchange in size εs, [Ds;De] 7→ [Ds − εs;De + εs], wecompute the effect they will have on the resulting IoU,respectively. Starting with the size change, the IoUfor case 4 becomes:

O =Ae− (Ds− εs)

De + εs−As=

Ae−Ds + εs

De−As + εs. (2)

Taking the derivative with respect to εs yields

∂O∂εs

=(De−As + εs)− (Ae−Ds + εs)

(De−As + εs)2 (3)

At the stationary solution, i.e., εs = 0, we thus obtain

limεs→0

∂O∂εs

=(De−As)− (Ae−Ds)

(De−As)2 =

(Iad + Iad + Iad)− Iad

(Iad + Iad + Iad)2 > 0

(4)

Where we have used the fact that: Ae−Ds = Iad , andDe−As = Iad + Iad + Iad

Following the same procedure for the case of achange in position we get for εp:

O =Ae− (Ds + εp)

De + εp−As=

Ae−Ds− εp

De−As + εp. (5)

limεp→0

∂O∂εp

=− (De−As)+(Ae−Ds)

(De−As)2 =

− (Iad + Iad + Iad)+ Iad

(Iad + Iad + Iad)2 < 0

(6)

Computing both derivatives for all cases 1.-4. re-sults in the following table:

case εs εp

1. − 2Iad(Iad+Iad+Iad)

2 < 0 0

2. (Iad+Iad+Iad)−Iad(Iad+Iad+Iad)

2 > 0 (Iad+Iad+Iad)+Iad(Iad+Iad+Iad)

2 > 0

3. 2(Iad+Iad+Iad)

(Iad+Iad+Iad)2 > 0 0

4. (Iad+Iad+Iad)−Iad(Iad+Iad+Iad)

2 > 0 − (Iad+Iad+Iad)+Iad(Iad+Iad+Iad)

2 < 0

The zero entries above imply that if the annotationlies completely inside the detection (case 1.) or thedetection lies completely inside the annotation (case3.), an incremental shift does not change the IoU mea-sure. The negative/positive derivatives of εs in case 1.and 3. lead to a compensation of a too large/too smalldetection. The positive/negative derivatives of εp incase 2. and 4. lead to a compensation of a detection-displacement to the left/right. The problematic casesare the positive derivatives of εs in case 2. and 4.: Inthe case of a displacement error, the IoU measure isalways improved by increasing the size of the detec-tion. For the majority of cases a slight increase in de-tection size will improve the overlap, only for the firstcase when the detection is overlapping on both sideswill it decrease the overlap score. This is in contrastto the change in position where it is equally likely todecrease the overlap depending on the direction se-lected. This results in a biased estimate of the objectsize.

Page 5: Countering bias in tracking evaluations - DiVA portal

3.2 Unbiased intersection over union

In order to remove this bias we also account for thetrue negatives, that is the parts of the image that isneither annotated as belonging to the target, or con-sidered to be part of the object by the tracker. We dothis by computing an IoU score for the object as usual,but also the inverse IoU, that is IoU with respect to thebackground. We then weight these two together usingthe relative weights wo and wbg, derived from the sizeof the object in the image. The new unbiased overlapmetric is:

O =Aad

Aad +Aad +Aadwo +

AadAad +Aad +Aad

wbg (7)

It is now no longer possible to simply increase thebounding box size and obtain a better IoU, since ex-cessive background will be penalized by the secondterm. The severity of the penalty is balanced by thewbg factor. All that remains is then to set the cor-responding weights in a principled manner. A naiveapproach would be to set the weighting based on therelative size of the object in the image. However sincewe wish to equalize the impact of estimating the sizewrongly in the case of displacement errors, we canuse this to calculate a better weighting. We do this byreturning to the one dimensional case used before, butwith our new measure that includes the background:

Obg =Iad

Iad + Iad + Iad=

1−De +As

1−Ae +Ds(8)

The overlap with the background is calculated as thesize of the image, minus the size of the annotated area.As in figure 2 the size of our image is 1. For theIoU with background, we repeat all derivatives fromabove. The most interesting two cases 2./4. result in

case εs

2. − (Iad+Iad+Iad)−Iad(Iad+Iad+Iad)

2 < 0

4. − (Iad+Iad+Iad)−Iad(Iad+Iad+Iad)

2 < 0(9)

Combining this with the size-derivative for theIoU in case 2. and 4., we obtain a the following re-quirement for the weights wo and wbg = 1−wo

0 = wo(Iad + Iad + Iad)− Iad

(Iad + Iad + Iad)2 −

(1−wo)(Iad + Iad + Iad)− Iad(Iad + Iad + Iad)

2 (10)

Simplifying this expression with some algebraicmanipulation gives the weight for the annotation as:

wo =(Iad + Iad + Iad)

2

(Iad + Iad + Iad)2 +(Iad + Iad + Iad)2 (11)

0.0 0.2 0.4 0.6 0.8 1.0

image area / object area

0

5000

10000

15000

20000

25000

30000

35000

40000Object / image size ratio for tracking datasets

Figure 3: Histogram over ratio of image covered by theimage for common tracking datasets (OTB100 (Wu et al.,2015), VOT 2015 (Kristan et al., 2015), VOT-TIR 2015(Felsberg et al., 2015), VOT 2016 (Kristan et al., 2016)) .The vast majority of frames in all datasets have objects cov-ering less than a few percent of the image. In close to 100kframes from 4 different datasets, none contains an objectcovering more than 50% of the image area.

This gives an unbiased overlap estimate for an im-age of finite size according to (7) when using the de-rived weights for foreground and background.

4 Experimental Evaluation

First we investigate the statistics of current track-ing datasets with respect to object size, and concludethat the distribution of relative object sizes is signifi-cantly skewed towards smaller objects.

Surprisingly current tracking benchmark datasets(Wu et al., 2015), (Kristan et al., 2015), (Kristan et al.,2016) contain almost no frames where the tracked ob-ject covers a significant portion of the image, a his-togram over the ratio of image covered by the anno-tated object can be seen in figure 3. Construing anentire new dataset from scratch is out of the scope ofthis work, instead we derive a new dataset by croppingparts of the image around the object. This effectivelyincreases the relative size of the object in each video.

We experimentally validate our unbiased intersec-tion over union score in two ways. First we gener-ate a large number of synthetic scenarios where thetracker prediction has the correct center coordinate,with varying size. We investigate by comparing theperformance of well known state of the art trackerCCOT (Danelljan et al., 2016) with a naive methodthat always outputs the entire image on a set of se-quences with a wide range of object sizes.

In order to demonstrate the bias inherent in the tra-ditional IoU, we compare the overlap given by it withthat of our new overlap score. From figure 4 it is ap-parent that the penalty for an excessively large bound-ing box decreases with increased size of the box, untilthe size of the box itself is saturated at an IoU over-

Page 6: Countering bias in tracking evaluations - DiVA portal

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6

Scale Factor

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Ove

rlap

Effect of scale change with no offset

IoU

Our

Figure 4: Scale factor plotted against overlap for a correctlycentered bounding box. The right side of the curve clearlyshows that the IoU score (blue) is not symmetric. Whileover-estimating the size of the object is penalized, it is notas harsh as over-estimation. Our unbiased score (red), whilenot perfectly symmetric is still significantly better, particu-larly for larger objects.

lap of 0.6. For our unbiased overlap score the penaltyfor increasing the size decreases far more rapidly, at asimilar rate as to decreasing the size of the boundingbox, until it saturates at a lower point. The bound-ing box size saturates as the edge hits the edge of theimage and is truncated. Truncating the bounding boxis reasonable since the size of the image is known,and no annotations exist outside the image. For theIoU this means that the lowest possible score is theratio between the object area and the total image area.For our proposed overlap score the saturation point ismuch lower as the minimal value is scaled by the sizeof the object relative to the image.

In order to show the impact of the bias in IoU ina more realistic situation we generate a number of se-quences with varying object size. The sequences aregenerated from the BlurCar1 sequence by croppinga part of the image around the tracked object, effec-tively zooming in on the tracked object. We comparethe performance of the state-of-the art CCOT (Danell-jan et al., 2016), (the vot2016 winner (Kristan et al.,2016)) with a naive baseline tracker. The naive base-line tracker always outputs the full image as predic-tion, except for the first row and column of pixels.The average overlap for the Frame tracker and theCCOT is shown in figure 5, the same plot but usingour unbiased score is shown in figure 6. An example

1.0 1.2 1.4 1.6 1.8 2.0

Image / object ratio

0.0

0.2

0.4

0.6

0.8

1.0

Ave

rage

over

lap

CCOT vs naive (IoU score)

CCOT

Naive

Figure 5: Performance of the CCOT and the full frametracker with respect to relative object size using the stan-dard IoU score. For large objects covering most of the im-age, the naive frame tracker outperforms the state-of-the-art CCOT. Once the objects are approximately 80% of theimage, the CCOT approach begins to outperform the naiveframe tracker. However, the naive frame tracker still obtainsa respectable overlap until the object is smaller than 50% ofthe image.

of a generated frame can be seen in figure 1.While the CCOT obtains near-perfect results on

the sequence, it does not track every frame exactlycorrect. For an object covering the entire frame, thisleads to an overlap slightly below 1. For a ratio be-tween image and object close to 1 the naive methodoutperforms the CCOT, regardless of score used, asis reasonable as the object covers the entire image.However it continues to outperform the CCOT untilthe ratio is slightly below 1.2 when using the tradi-tional IoU score (figure 5). With an object covering70% of the image the IoU is still at 0.7, only 0.1 lessthan that of the CCOT, despite not performing anytracking at all. As the size of the object decreasesso does the IoU, however it remains quite good evenfor smaller objects, when the object covers only halfthe image by area the IoU is still 0.3 despite coveringtwice as many pixels as the ground truth.

When performing the same experiment using ourproposed overlap score, the naive tracker is severelypenalized for over-estimating the object size. The cor-responding plot to 5, can be seen in 6. Here the over-lap score for the naive method is only higher than theCCOT for those cases where the object is coveringpractically the entire frame (image-to-object ratio less

Page 7: Countering bias in tracking evaluations - DiVA portal

1.0 1.2 1.4 1.6 1.8 2.0

Image / Object ratio

0.0

0.2

0.4

0.6

0.8

1.0

Ave

rage

over

lap

CCOT vs naive (Our score)

CCOT

Naive

Figure 6: Performance of the CCOT and the full frametracker for relative object sizes using our proposed score. Atlower relative size (larger object), the naive frame trackeroutperforms the state-of-the-art CCOT approach as it isguaranteed to cover the entire object, while the CCOT typ-ically has some offset error. At smaller object sizes, ourproposed score heavily penalizes the naive frame tracker.

Figure 7: Example frames from the CarBlur sequence, witha naive method that outputs close to the entire image as eachdetection (red box). The ground truth annotation is the bluebox. Due to severe motion blur and highly irregular move-ments in the sequence tracking is difficult. The traditionalIoU score for this frame is 0.26 (left), while our new unbi-ased metric provides a far lower score of 0.11 for both theleft and right images. This suggests that using the IoU isnot optimal in many cases.

than 1.05). In such situations even a minor mistake inpositioning is penalized more harshly. Once the ob-ject becomes relatively smaller the CCOT tracker be-gins to significantly outperform the naive method. Fi-nally the penalty for the naive method is far more sig-nificant, making the performance difference far moreobvious than when using the IoU metric.

In figure 7 we show some qualitative examples offrames from the cropped CarBlur sequence. As thevideo is extremely unstable tracking is difficult, due

to motion blur and sudden movements. Here a pre-dicted bounding box generated by the naive trackerprovides a decent score of 0.22, despite always out-putting the entire frame. When instead using our un-biased score the penalty for for over estimation ofobject size is severe enough that the overlap score ismore than halved. Here the IoU gives close to twicethe overlap score compared to our own approach.

5 Conclusions and further work

We have proven that the traditionally used IoUscore is biased with respect to over estimation ofobject sizes. We demonstrate this bias exists theo-retically, and derive a new unbiased overlap score.We note that most tracking datasets are heavily bi-ased in favor of smaller objects, and construct a newdataset by cropping parts of images at varying sizes.This demonstrates a major issue with current trackingbenchmarks as situations with large objects directlycorrespond to situations when the tracked objects areclose. We demonstrate the effect of using a biasedmetric in situations where the tracked object coversthe majority of the image, and compare to our new un-biased score. Finally we have demonstrated the effectof introducing larger objects into tracking sequencesby generating such a sequence, and comparing theperformance of a stationary tracker with that of a stateof the art method. While the CCOT significantly out-performs the stationary tracker for smaller objects (asis expected), for larger objects the naive approachsimply outputting the entire image is quite success-ful. In the future we aim to investigate the effect ofthis bias in object detection scenarios. It would alsobe relevant to construct a new tracking dataset wherethe tracked objects size is more evenly distributedthan what is currently typical. Acknowledgements:This work was supported by VR starting Grant (2016-05543), Vetenskapsradet through the framework grantEMC2, and the Wallenberg Autonomous Systems andSoftware program (WASP).

Page 8: Countering bias in tracking evaluations - DiVA portal

REFERENCES

Danelljan, M., Robinson, A., Khan, F. S., and Felsberg, M.(2016). Beyond correlation filters: Learning continu-ous convolution operators for visual tracking. In Euro-pean Conference on Computer Vision, pages 472–488.Springer.

Everingham, M., Van Gool, L., Williams, C. K., Winn, J.,and Zisserman, A. (2008). The pascal visual objectclasses challenge 2007 (voc 2007) results (2007).

Felsberg, M., Berg, A., Hager, G., Ahlberg, J., Kristan,M., Matas, J., Leonardis, A., Cehovin, L., Fernandez,G., Vojir, T., et al. (2015). The thermal infrared vi-sual object tracking vot-tir2015 challenge results. InProceedings of the IEEE International Conference onComputer Vision Workshops, pages 76–88.

Felsberg, M., Kristan, M., Matas, J., Leonardis, A.,Pflugfelder, R., Hager, G., Berg, A., Eldesokey, A.,Ahlberg, J., ehovin, L., Vojr, T., Lukei, A., and Fern-ndez, G. (2016). The thermal infrared visual objecttracking vot-tir2016 challenge results. In Proceedings,European Conference on Computer Vision (ECCV)workshops, pages 824–849.

Jaccard, P. (1912). The distribution of the flora in the alpinezone. New phytologist, 11(2):37–50.

Kristan, M., Leonardis, A., Matas, J., Felsberg, M.,Pflugfelder, R., ehovin, L., Vojr, T., Hager, G., Lukei,A., and Fernndez, G. (2016). The visual object track-ing vot2016 challenge results. In Proceedings, Euro-pean Conference on Computer Vision (ECCV) work-shops.

Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Ce-hovin, L., Fernandez, G., Vojir, T., Hager, G., Nebe-hay, G., and Pflugfelder, R. (2015). The visual objecttracking vot2015 challenge results. In Proceedings ofthe IEEE international conference on computer visionworkshops, pages 1–23.

Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Ce-hovin, L., Nebehay, G., Vojir, T., Fernandez, G.,and Lukezic, A. (2014). The visual object trackingvot2014 challenge results. In Proceedings, EuropeanConference on Computer Vision (ECCV) Visual Ob-ject Tracking Challenge Workshop, Zurich, Switzer-land.

Powers, D. M. W. (2011). Evaluation: from precision, recalland f-measure to roc, informedness, markedness andcorrelation. International Journal of Machine Learn-ing Technology, 2(1):37–63.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,S., Ma, S., Huang, Z., Karpathy, A., Khosla, A.,Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015).ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV),115(3):211–252.

Wu, Y., Lim, J., and Yang, M.-H. (2013). Online objecttracking: A benchmark. In Proceedings of the IEEEconference on computer vision and pattern recogni-tion, pages 2411–2418.

Wu, Y., Lim, J., and Yang, M.-H. (2015). Object trackingbenchmark. PAMI.