deep learning a brief introductionvisao/tld_presentation.pdf · introduction long term tracking...
TRANSCRIPT
![Page 1: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/1.jpg)
TRACKING-LEARNING-DETECTION
V. H. AYMA
Laboratório de Visão Computacional Departamento de Engenharia Elétrica
PUC – Rio
![Page 2: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/2.jpg)
2
CONTENT
1. INTRODUCTION
2. TLD FRAMEWORK • TRACKING
• DETECTION
• LEARNING
![Page 3: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/3.jpg)
3
CONTENT
1. INTRODUCTION
2. TLD FRAMEWORK
3. TRACKING
4. DETECTION
5. LEARNING
![Page 4: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/4.jpg)
4
INTRODUCTION
• TLD Framework1 was proposed by Zdenek Kalal, Krystian Mikolajczyk and Jiri Matas.
– P-N Learning: Bootstrapping Binary Classifiers by Structural Constrains2.
– Forward-Backward Error: Automatic Detection of Tracking Failures3.
1) Z. Kalal, K. Mikolajczyk and J. Matas, “Tracking-Learning-Detection”, Pattern Analysis and Machine
Intelligence, 2011.
2) Z. Kalal, K. Mikolajczyk and J. Matas, “Forward-Backward Error: Automatic Detection of Tracking Failures”, International Conference on Pattern Recognition, 2010, pp. 23-26.
3) Z. Kalal, J. Matas and K. Mikolajczyk, “P-N Learning: Bootstrapping Binary Classifiers by Structural Constrains”, Conference on Computer Vision and Pattern Recognition, 2010.
![Page 5: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/5.jpg)
5
INTRODUCTION LONG TERM TRACKING
The goal is to determine the object’s bounding box or indicate the object is not visible in the frames that fallows.
![Page 6: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/6.jpg)
6
INTRODUCTION LONG TERM TRACKING
Long-term tracking algorithms must:
• Detect the object when it reappears in the camera’s field of view.
– Object may change its appearance during its absence, thus initial appearance becomes irrelevant.
• Handle:
– Scale variations.
– Illumination variations.
– Occlusions.
– Background clutter.
• Operate at frame rate (real time).
![Page 7: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/7.jpg)
7
INTRODUCTION LONG TERM TRACKING
Long-term tracking algorithms must:
• Detect the object when it reappears in the camera’s field of view.
– Object may change its appearance during its absence, thus initial appearance becomes irrelevant.
• Handle:
– Scale variations.
– Illumination variations.
– Occlusions.
– Background clutter.
• Operate at frame rate (real time).
![Page 8: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/8.jpg)
8
INTRODUCTION LONG TERM TRACKING
Long-term tracking algorithms must:
• Detect the object when it reappears in the camera’s field of view.
– Object might change its appearance during its absence, thus initial appearance becomes irrelevant.
• Handle:
– Scale variations.
– Illumination variations.
– Occlusions.
– Background clutter.
• Operate at frame rate (real time).
![Page 9: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/9.jpg)
9
INTRODUCTION LONG TERM TRACKING
TRACKING (Estimates location)
+ Requires initialization
+ Produces smooth trajectories
+ Reasonably fast
‒ Accumulate errors during track (drifts)
‒ Fails when the object disappears
‒ Doesn’t have a post failure recovery mechanism
DETECTION (Finds the best match)
‒ Requires off-line training
‒ Works over a know object
‒ Produces a sort of discrete trajectories
‒ Computationally expensive
+ Doesn’t drift
+ Doesn’t fail if the object disappears
LONG-TERM TRACKING
![Page 10: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/10.jpg)
10
INTRODUCTION LONG TERM TRACKING
TRACKING (Estimates location)
+ Requires initialization
+ Produces smooth trajectories
+ Reasonably fast
‒ Accumulate errors during track (drifts)
‒ Fails when the object disappears
‒ Doesn’t have a post failure recovery mechanism
DETECTION (Finds the best match)
‒ Requires off-line training
‒ Works over a know object
‒ Produces a sort of discrete trajectories
‒ Computationally expensive
+ Doesn’t drift
+ Doesn’t fail if the object disappears
LONG-TERM TRACKING
![Page 11: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/11.jpg)
11
INTRODUCTION LONG TERM TRACKING
TRACKING (Estimates location)
+ Requires initialization
+ Produces smooth trajectories
+ Reasonably fast
‒ Accumulate errors during track (drifts)
‒ Fails when the object disappears
‒ Doesn’t have a post failure recovery mechanism
DETECTION (Finds the best match)
‒ Requires off-line training
‒ Works over a know object
‒ Produces a sort of discrete trajectories
‒ Computationally expensive
+ Doesn’t drift
+ Doesn’t fail if the object disappears
LONG-TERM TRACKING
![Page 12: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/12.jpg)
12
INTRODUCTION LONG TERM TRACKING
• Interaction between tracking and detection may benefit long-term tracking.
TRACKER (Estimates location)
DETECTOR (Finds the best match)
REINITIALIZE
TRAINING DATA
![Page 13: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/13.jpg)
13
INTRODUCTION LONG TERM TRACKING
• Interaction between tracking and detection may benefit long-term tracking.
• How reliable is the training data?
TRACKER (Estimates location)
DETECTOR (Finds the best match)
REINITIALIZE
TRAINING DATA
![Page 14: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/14.jpg)
14
INTRODUCTION LONG TERM TRACKING
• Long-term tracking can be decomposed into:
• Tracking: follows the object from frame to frame.
TRACKING
![Page 15: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/15.jpg)
15
INTRODUCTION LONG TERM TRACKING
• Long-term tracking can be decomposed into:
• Tracking: follows the object from frame to frame.
• Detection: localizes the appearances that have been observed so far.
TRACKING
DETECTION
![Page 16: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/16.jpg)
16
INTRODUCTION LONG TERM TRACKING
• Long-term tracking can be decomposed into:
• Tracking: follows the object from frame to frame.
• Detection: localizes the appearances that have been observed so far.
TRACKING
DETECTION corrects
![Page 17: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/17.jpg)
17
INTRODUCTION LONG TERM TRACKING
• Long-term tracking can be decomposed into:
• Tracking: follows the object from frame to frame.
• Detection: localizes the appearances that have been observed so far.
• Learning: estimates the detector errors.
TRACKING
LEARNING
DETECTION corrects
updates
![Page 18: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/18.jpg)
18
INTRODUCTION LONG TERM TRACKING
• Long-term tracking can be decomposed into:
• Tracking: follows the object from frame to frame.
• Detection: localizes the appearances that have been observed so far.
• Learning: estimates the detector errors.
– Should deal with an arbitrary complex video stream.
– Mustn’t degrade the detector with irrelevant information.
– Operates in real time.
TRACKING
LEARNING
DETECTION corrects
updates
![Page 19: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/19.jpg)
19
CONTENT
1. INTRODUCTION
2. TLD FRAMEWORK
• TRACKING
• DETECTION
• LEARNING
![Page 20: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/20.jpg)
20
TLD FRAMEWORK
Learning
Detection Tracking
Fragments of trajectory
Detections
Re-initialization
Training data
![Page 21: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/21.jpg)
21
TLD FRAMEWORK
Learning
Detection Tracking
Fragments of trajectory
Detections
Re-initialization
Training data
![Page 22: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/22.jpg)
22
TLD FRAMEWORK ADAPTIVE TRACKING
• Adaptive tracking eventually fails due to the insertion of background information into its model, commonly known as drifting.
![Page 23: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/23.jpg)
23
TLD FRAMEWORK ADAPTIVE TRACKING
• Adaptive tracking eventually fails due to the insertion of background information into its model, commonly known as drifting.
• Idea: recognize tracking failures and update only if the tracking is correct.
![Page 24: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/24.jpg)
24
TLD FRAMEWORK ADAPTIVE TRACKING
• Which of these points was tracked correctly?
Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.
framet framet+1
1
2
![Page 25: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/25.jpg)
25
TLD FRAMEWORK ADAPTIVE TRACKING
• Which of these points was tracked correctly?
• Measure the similarity between patches around the points.
Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.
framet framet+1
![Page 26: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/26.jpg)
26
TLD FRAMEWORK ADAPTIVE TRACKING
• Which of these points was tracked correctly?
• Measure the similarity between patches around the points.
– Typically Normalized Cross Correlation and Sum of Squared Errors between patches are embedded into tracking algorithms.
Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.
framet framet+1
![Page 27: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/27.jpg)
27
TLD FRAMEWORK ADAPTIVE TRACKING
• Which of these points was tracked correctly?
• Measure the similarity between patches around the points.
– Typically Normalized Cross Correlation and Sum of Squared Errors between patches are embedded into tracking algorithms.
• Or, compute the Forward-Backward Error.
Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.
framet framet+1
![Page 28: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/28.jpg)
28
TLD FRAMEWORK ADAPTIVE TRACKING
FORWARD-BACKWARD ERROR:
• Track points from frame t to frame t + 1, i.e., Forward tracking.
Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.
framet framet+1
![Page 29: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/29.jpg)
29
TLD FRAMEWORK ADAPTIVE TRACKING
FORWARD-BACKWARD ERROR:
• Track points from frame t to frame t + 1, i.e., Forward tracking.
• Track points from frame t + 1 to frame t, i.e., Backward tracking.
Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.
framet framet+1
![Page 30: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/30.jpg)
30
TLD FRAMEWORK ADAPTIVE TRACKING
FORWARD-BACKWARD ERROR:
• Track points from frame t to frame t + 1, i.e., Forward tracking.
• Track points from frame t + 1 to frame t, i.e., Backward tracking.
• Compute the error between initial points and backward points.
– Small errors = correctly tracked points.
Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.
framet framet+1
error2
error1
![Page 31: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/31.jpg)
31
TLD FRAMEWORK ADAPTIVE TRACKING
MEDIAN FLOW TRACKER framet framet+1
Initialize a grid
Track points between frames
Estimate Bounding box
Filter out 50% outliers
Estimate point reliability
![Page 32: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/32.jpg)
32
TLD FRAMEWORK
Learning
Detection Tracking
Fragments of trajectory
Detections
Re-initialization
Training data
![Page 33: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/33.jpg)
33
TLD FRAMEWORK DETECTION
• The goal is to discriminate the object from the background, i.e., model the appearance of the object.
• Let M be the object model, so that:
Where,
– p+, represents the object patches.
– p‒, are the background patches.
• New image patches can be classified as belonging to the object or the background using a Nearest Neighbor Classifier (NNC).
ppppppn1m1
,,,,,,,M
22
![Page 34: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/34.jpg)
34
TLD FRAMEWORK DETECTION
NEAREST NEIGHBOR CLASSIFIER
• Red and black points represent the object and background patches, respectively, in a d-dimensional space.
![Page 35: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/35.jpg)
35
TLD FRAMEWORK DETECTION
NEAREST NEIGHBOR CLASSIFIER • Use a relative similarity, Sr, to classify a
new patch (blue point), formally:
Where,
– S+, is the similarity with the positive
nearest neighbor.
– S–, is the similarity with the negative
nearest neighbor.
• Red and black points represent the object and background patches,
respectively, in a d-dimensional space.
Mp,Mp,
Mp,
SSS
Sr
![Page 36: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/36.jpg)
36
TLD FRAMEWORK DETECTION
NEAREST NEIGHBOR CLASSIFIER • Similarity between patches, S(pi, pj), is
given by:
Where,
– NCC, is the Normalized Cross Correlation between patches pi and pj.
• Red and black points represent the object and background patches, respectively, in a d-dimensional space.
1NCC5.0 jiji ,pp,ppS
![Page 37: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/37.jpg)
37
TLD FRAMEWORK DETECTION
• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:
– Scale step = 1.2,
– Horizontal step = 0.1(object’s width),
– Vertical step = 0.1(object’s height),
– Minimal bounding box size = 20,
Produces around 50K bounding boxes.
![Page 38: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/38.jpg)
38
TLD FRAMEWORK DETECTION
• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:
– Scale step = 1.2,
– Horizontal step = 0.1(object’s width),
– Vertical step = 0.1(object’s height),
– Minimal bounding box size = 20,
Produces around 50K bounding boxes.
![Page 39: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/39.jpg)
39
TLD FRAMEWORK DETECTION
• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:
– Scale step = 1.2,
– Horizontal step = 0.1(object’s width),
– Vertical step = 0.1(object’s height),
– Minimal bounding box size = 20,
Produces around 50K bounding boxes.
![Page 40: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/40.jpg)
40
TLD FRAMEWORK DETECTION
• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:
– Scale step = 1.2,
– Horizontal step = 0.1(object’s width),
– Vertical step = 0.1(object’s height),
– Minimal bounding box size = 20,
Produces around 50K bounding boxes.
![Page 41: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/41.jpg)
41
TLD FRAMEWORK DETECTION
• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:
– Scale step = 1.2,
– Horizontal step = 0.1(object’s width),
– Vertical step = 0.1(object’s height),
– Minimal bounding box size = 20,
Produces around 50K bounding boxes.
![Page 42: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/42.jpg)
42
TLD FRAMEWORK DETECTION
• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:
– Scale step = 1.2,
– Horizontal step = 0.1(object’s width),
– Vertical step = 0.1(object’s height),
– Minimal bounding box size = 20,
Produces around 50K bounding boxes.
![Page 43: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/43.jpg)
43
TLD FRAMEWORK DETECTION
• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:
– Scale step = 1.2,
– Horizontal step = 0.1(object’s width),
– Vertical step = 0.1(object’s height),
– Minimal bounding box size = 20,
Produces around 50K bounding boxes.
• Evaluate the NNC over every possible patch in the image is an unfeasible task as it involves evaluation of the relative similarity.
![Page 44: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/44.jpg)
44
TLD FRAMEWORK DETECTION
• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:
– Scale step = 1.2,
– Horizontal step = 0.1(object’s width),
– Vertical step = 0.1(object’s height),
– Minimal bounding box size = 20,
Produces around 50K bounding boxes.
• Evaluate the NNC over every possible patch in the image is an unfeasible task as it involves evaluation of the relative similarity.
USE A CASCADE OF CLASSIFIERS INSTEAD!
![Page 45: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/45.jpg)
45
TLD FRAMEWORK DETECTION
CASCADE OF CLASSIFIERS
PATCH VARIANCE
ENSEMBLE CLASSIFIER NN CLASSIFIER
REJECTED PATCHES
ACCEPTED PATCHES
![Page 46: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/46.jpg)
46
TLD FRAMEWORK DETECTION
CASCADE OF CLASSIFIERS
• Rejects patches that have variance smaller than 50% of the initial bounding box variance.
PATCH VARIANCE
ENSEMBLE CLASSIFIER NN CLASSIFIER
REJECTED PATCHES
ACCEPTED PATCHES
![Page 47: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/47.jpg)
47
TLD FRAMEWORK DETECTION
CASCADE OF CLASSIFIERS
• An Ensemble Classifier consist of n base classifiers.
• Each base classifier, ci, performs a number of pixel comparisons, fk, for each patch, resulting in a binary code, xi.
PATCH VARIANCE
ENSEMBLE CLASSIFIER NN CLASSIFIER
REJECTED PATCHES
ACCEPTED PATCHES
……
![Page 48: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/48.jpg)
48
TLD FRAMEWORK DETECTION
CASCADE OF CLASSIFIERS • Consider an image patch, p, and an Ensemble Classifier which has a
single base classifier to perform four pixel comparisons, fk, so that:
• Finally, the binary code x is equal to 5.
otherwise ,0
positionposition ,1 ,, yellowkredk
k
ppf
f1 = 0 f2 = 1 f3 = 0 f4 = 1
![Page 49: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/49.jpg)
49
TLD FRAMEWORK DETECTION
CASCADE OF CLASSIFIERS
• An Ensemble Classifier consist of n base classifiers.
• Each base classifier, ci, performs a number of pixel comparisons, fk, for each patch, resulting in a binary code, xi.
• An image patch is classified as “object” if the mean of the posterior probabilities:
is greater than 0.5.
PATCH VARIANCE
ENSEMBLE CLASSIFIER NN CLASSIFIER
REJECTED PATCHES
ACCEPTED PATCHES
……
n
i
ii xyPn
P1
1
![Page 50: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/50.jpg)
50
TLD FRAMEWORK DETECTION
CASCADE OF CLASSIFIERS • The posterior probability for the i-th base classifier is given by:
Where, n+(xi) and n‒(xi) correspond to the number of positive and negative patches, respectively, that were assigned the same binary code.
ii
iii
xnxn
xnxyP
![Page 51: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/51.jpg)
51
TLD FRAMEWORK DETECTION
CASCADE OF CLASSIFIERS • For example, consider a trained base classifier and a xi = 5, then:
6.010
6
46
6511
xyP
![Page 52: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/52.jpg)
52
TLD FRAMEWORK DETECTION
CASCADE OF CLASSIFIERS
PATCH VARIANCE
ENSEMBLE CLASSIFIER NN CLASSIFIER
REJECTED PATCHES
ACCEPTED PATCHES
……
![Page 53: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/53.jpg)
53
TLD FRAMEWORK
Learning
Detection Tracking
Fragments of trajectory
Detections
Re-initialization
Training data
![Page 54: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/54.jpg)
54
TLD FRAMEWORK LEARNING
• Use online learning techniques to improve the performance of the detector.
• Online learning is challenging…
– Lack of training data
– Unlabeled data
– Requires real time processing.
• Assuming that these challenges are overcome…
– Identify the errors committed by the detector in order to update it and avoid these errors in the future.
![Page 55: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/55.jpg)
55
TLD FRAMEWORK LEARNING
• Use online learning techniques to improve the performance of the detector.
• Online learning is challenging…
– Lack of training data
– Unlabeled data
– Requires real time processing.
• Assuming that these challenges are overcome…
– Identify the errors committed by the detector in order to update it and avoid these errors in the future.
P-experts N-experts
![Page 56: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/56.jpg)
56
TLD FRAMEWORK LEARNING
– Identify the errors committed by the detector in order to update it and avoid these errors in the future.
Missed detection
False alarms
![Page 57: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/57.jpg)
57
TLD FRAMEWORK LEARNING
– N-expert explores the spatial structure in the data in order to identify false positives (false alarms).
![Page 58: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/58.jpg)
58
TLD FRAMEWORK LEARNING
– P-expert explores the temporal structure in the data in order to identify false negatives (missed detections).
![Page 59: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/59.jpg)
59
TLD FRAMEWORK LEARNING
PN – LEARNING: INITIAL LEARNING
Training Set Supervised
Training
Classifier
labeled data
classifier
parameters
![Page 60: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/60.jpg)
60
TLD FRAMEWORK LEARNING
PN – LEARNING
Training Set Supervised
Training
Classifier
P-N Experts
unlabeled data labels
classifier
parameters
[‒] examples
[+] examples
![Page 61: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns](https://reader030.vdocuments.us/reader030/viewer/2022040409/5ec655f94faae761ee4db5f3/html5/thumbnails/61.jpg)
61
TLD FRAMEWORK LEARNING
PN – LEARNING
Training Set Supervised
Training
Classifier
P-N Experts
labeled data
unlabeled data labels
classifier
parameters
[‒] examples
[+] examples