institute of informatics institute of neuroinformatics deep...

79
Deep Learning Daniel Gehrig Institute of Informatics – Institute of Neuroinformatics 1

Upload: others

Post on 21-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Deep Learning

Daniel Gehrig

Institute of Informatics – Institute of Neuroinformatics

1

Page 2: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Outline• Introduction

• Motivation and history

• Supervised Learning• Semantic Segmentation• Image Description• Image Localization

• Unsupervised Learning• Depth estimation• Optical Flow estimation• Depth and egomotion

• Applications to Computer Vision• LIFT: Learned Invariant Feature Transform• Superpoint• Place Recognition - NetVLAD

• Conclusions2

Relevant for the exam

Page 3: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

The Deep Learning Revolution

Medicine Media & Entertainment

Autonomous DrivingSurveillance & Security

3

Page 4: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Some History

1958 1974

Perceptron

Critics

1969

Back-Propagation

AI Winter

1995

SVM

1998

Convolutional Neural Networs for Handwritten Digits

Recognition

2006

Restricted BoltzmanMachines

2012

AlexNet

4

Page 5: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

What changed?

• Hardware Improvements

• Big Data Available

• Algorithmic Progress

5

Page 6: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Image ClassificationTask of assigning an input image a label from a fixed set of categories.

Slide adapted from CNNs for Visual Recognition (Stanford)6

Page 7: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

The semantic gap• What computers see against what we see

7

Page 8: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Classification Challenges

Slide adapted from CNNs for Visual Recognition (Stanford)

Directly specifying how a category looks like is impossible.

We need use a Data Driven Approach

8

Page 9: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Supervised Learning

𝑓(𝑥, 𝜃)

Function parametersor weights

N numbers representingclass scores

0.10.7…0.0

1.00.0…0.0

Predicted Ground truth, 𝑦𝑖

𝐿𝑜𝑠𝑠(𝑓 𝑥𝑖 , 𝜃 , 𝑦𝑖)Update

Find function 𝑓 𝑥, 𝜃 that imitates a ground truth signal

9

Page 10: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Machine Learning Keywords

• Loss: Quantify how good 𝜃 are

•Optimization: The process of finding 𝜃that minimize the loss

• Function: Problem modelling -> Deep networks are highly non-linear 𝑓(𝑥, 𝜃)

Slide adapted from CNNs for Visual Recognition (Stanford) 10

Page 11: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Classifiers: K-Nearest neighbor

𝑓(𝒙, 𝜃) = label of the K training examples nearest to 𝒙

• How fast is training? How fast is testing?• O(1), O(n)

• What is a good distance metric ? What K should be used?

Test example

Training examples

from class 1

Training examples

from class 2

Features are represented in the descriptor space

11

Page 12: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Classifiers: Linear

• Find a linear function to separate the classes:

𝑓 𝒙, 𝜃 = 𝑠𝑔𝑛 𝒘 𝒙 + 𝑏

• What is now 𝜃? What is the dimensionality of images? 12

Page 13: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Classifiers: non-linear

Bad classifier (over fitting)Good classifier

• What is 𝑓(𝒙, 𝜃) ?

13

Page 14: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Biological Inspiration

Axon

Terminal Branches

of AxonDendrites

S

x1

x2

w1

w2

wn

xn

x3 w3

f 𝑥, 𝜃 = 𝐹(𝑊𝑥), F is a non-linear activation function (Step, ReLU, Sigmoid)

The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, Frank Rosenblatt (1958)

14

Page 15: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Multi Layer Perceptron

𝒇 (𝒙, 𝜽)

Non-linear Activation functions (ReLU, sigmoid, etc.)

Neural Networks and Deep Learning, Chapter 2– Michael Nielsen 15

Page 16: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Forward propagation

Forward Pass

𝐿𝑜𝑠𝑠(𝑓 𝑥𝑖 , 𝜃 , 𝑦𝑖)

Neural Networks and Deep Learning, Chapter 2– Michael Nielsen 16

Page 17: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Optimization: Back-propagation

Backward PassCompute gradients with respect to all parameters

and perform gradient descent

𝜃𝑛𝑒𝑤 = 𝜃𝑜𝑙𝑑 − 𝜇𝛻𝐿𝑜𝑠𝑠

𝐿𝑜𝑠𝑠(𝑓 𝑥𝑖 , 𝜃 , 𝑦𝑖)

Artificial Neural Networks, Back Propagation and the Kelley-Bryson Gradient Procedure – Stuart E .Dreyfus

17

Page 18: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Problems of fully connected network

• Too many parameters -> possible overfit

• We are not using the fact that inputs are images!

18

Page 19: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Convolutional Neural Networks

Gradient-based learning applied to document recognition, Y. LeCun et al. (1998)19

Page 20: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Going Deep

10,2%

11,5%

15,6%

62,7%

Flower

Cup

Dog

Car

20

Page 21: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

• Inspired by the human visual system• Learn multiple layers of transformations of input• Extract progressively more sophisticated representations

Why Deep?

21

Page 22: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Supervised Learning

• In supervised learning it is assumed that we have access to both input data or images and ground truth labels.

• Networks trained with supervision usually perform best

• However, usually it is hard to generate this data since it often has to be hand-labelled

𝐿 𝑓 𝑥, 𝜃 , 𝑦

Network

ground truth label

Image prediction

22

Page 23: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Supervised LearningImage Segmentation

Fully Convolutional Networks for Semantic Segmentation – J. Long, E. Shelhamer, 201523

Page 24: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Supervised LearningImage Captioning

Deep Visual-Semantic Alignments for Generating Image Descriptions – Karpathy et al., 201524

Page 25: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Supervised LearningImage Localization

PlaNet - Photo Geolocation with Convolutional Neural Networks - Weyand et al. 2016 25

Page 26: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Unsupervised Learning

• In unsupervised learning we only have access to input data or images.

• Usually, these methods are more popular because they can use much larger datasets that do not need to be manually labelled.

𝐿 𝑓 𝑥, 𝜃

Network

predictionImage

26

Page 27: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Unsupervised LearningMonocular Depth Estimation

Unsupervised Monocular Depth Estimation with Left-Right Consistency,Godard et al. 2017

27

Page 28: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Unsupervised LearningStructure from Motion

Unsupervised Learning of Depth and Ego-Motion from Video, Zhou et al. 2017 28

Page 29: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Unsupervised LearningDense Optical Flow

Characteristic of the learned flow:• Robustness against light changes (Census Transform)• Occlusion handling (Bi-directional Flow)• Smooth flow

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss, Meister et al, 2018

29

Page 30: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Unsupervised vs. Supervised learning

30

Supervised Unsupervised

Performance Usually better for the same dataset size. Usually worse, but can outperform supervised methods due to larger data availability.

Data availability Low, due to manual labelling. High, no labelling required.

Training Simple, ground truth gives a strong supervision signal.

Sometimes difficult, loss functions have to be engineered to get good results.

Generalizability Good, although sometimes the network learns to blindly copy the labels provided, leading to poor generalizability.

Better, since unsupervised losses often encode the task in a more fundamental way.

Page 31: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Applications to Computer Vision

31

Page 32: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Deep Descriptors: LIFT

LIFT Pipeline consists of 3 neural networks:

• A keypoint detector

• An orientation detector

• A descriptor generator

LIFT: Learned Invariant Feature Transform, Kwang Moo Yi et al., 2016

Applications to Computer VisionKeypoint Detection and Description

32

Page 33: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

LIFT Loss

33

Page 34: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

LIFT Results• Works better than SIFT! (well, in some datasets)

34

Page 35: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

SuperPoint: Self-Supervised Interest Point Detection and Description• SIFT and friends are complicated heuristic algorithms• Still, SIFT is a hard to beat baseline for new methods• Can we do better with a data driven approach?• SuperPoint shows how a convolutional neural network

can be trained to predict keypoints and descriptorssimultaneously.

• Detector less accurate than SIFT, but descriptorshown to outperform SIFT in some scenarios.

Applications to Computer VisionKeypoint Detection and Description

SuperPoint: Self-Supervised Interest Point Detection and Description – D. Detone et al, CVPRW 201835

Page 36: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

SuperPoint - Training

a) Training on synthetic dataset to bootstrap the detector

a) Use this detector on real images and generate groundtruth correspondences by sampling perspective transformations

a) Jointly training both detector and descriptor

36

Page 37: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

SuperPoint - Qualitative (Cherry Picked) Results

37

Page 38: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

SuperPoint - Quantitative Results

- ε: Pixel distance to count keypoint as correct. SIFT is very accurate due to sub-pixel refinement.- MLE (Mean Localization Error): Lower is better. Metric for determining detector accuracy- NN mAP (nearest-neighbor mean average precision): Measures discriminativeness of descriptors. Higher is

better.- M. Score (Matching Score): Evaluates detector and descriptor jointly on groundtruth correspondences.

38

Page 39: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Applications to Computer VisionPlace Recognition

Image representation space

f( )

f( )

f( )f( )

f( )f( )

f( )Query

Design an “image

representation” extractor

𝑓(𝐼, 𝜃)

Geotagged image database

39

Page 40: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

NetVladMimic the classical pipeline with deep learning

Convolutional layers fromAlexNet or VGGNet

Trainable pooling layer

Extract local features (SIFT)

200101…

F(I)Aggregate (BoW, VLAD, FV)

Image I

Slide adapted from NetVLAD presentation, CVPR 2017 40

Page 41: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

• Triplet loss formulation

NetVlad Loss

Disclaimer: The actual NetVlad loss is a slightly more complicated version of the one above

Matching samples

Non matching samples

margin

41

Page 42: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

NetVlad Results• Code, dataset and trained network online: give it a try!

• http://www.di.ens.fr/willow/research/netvlad/ the link is in the

Query Top result

Green: Correct

Red: IncorrectSlide adapted from NetVLAD presentation, CVPR 2017 42

Page 43: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Deep Visual Odometry (DeepVO)

DeepVO End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks, S. Wang et al. ICRA 2017

- End-to-end trained method that computes the camera pose from images directly

- Encodes image pairs to geometric features using a convolutional neural network (CNN)

- Feeds these features to a recurrent neural network (RNN) which outputs the 6-DOF pose

43

Page 44: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Deep Visual Odometry (DeepVO) - Training

- The network is trained on the KITTI VO/SLAM [1] benchmark with 7410 samples. To deal with the smallamount of data they use a network pretrained on FlowNet [2]

- During training DeepVO tries to minimize the difference between ground-truth and predicted position and orientation (in euler angles):

- The network is later tested on sequences of KITTI which were not seen during training

[1] Are we ready for autonomous driving? the KITTI vision benchmark suite, A. Geiger et al. CVPR 2012.[2] FlowNet: Learning Optical Flow with Convolutional Networks, A. Dosovitskiy et al. ICCV 2017. 44

Page 45: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Deep Visual Odometry (DeepVO) - Results

- DeepVO is compared to VISO2 [1] (Monocular and Stereo setup) in terms of rotation and translation error for different subtrajectory lengths and speeds.

- While DeepVO outperforms the monocular setup for VISO2 (VISO2_M) it is still outperformed by the stereo setup (VISO2_S)

- An important thing to note is that learning based visual odometry is not yet solved since many of these method overfit to the driving scenario (2D motion). However, this field is growing rapidly.

[1] Are we ready for autonomous driving? the KITTI vision benchmark suite, A. Geiger et al. CVPR 201245

Page 46: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Deep Learning Limitations

• Require lots of data to learn

• Difficult debugging and finetuning

• Poor generalization across similar tasks

Neural NetworksPractitioners

46

Page 47: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Things to remember

• Deep Learning is able to extract meaningful patterns from data.

• It can be applied to a wide range of tasks.

• Artificial Intelligence ≠ Deep Learning

47

Page 48: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Come over for projects in DL!

http://rpg.ifi.uzh.ch/student_projects.php

Visit our webpage for projects!

48

Page 49: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

Additional Readings

• Neural Networks and Deep Learning, by Michael Nielsen [Chapter 2]

• Practical Recommendations for Gradient-Based Training of Deep Architectures, Y. Bengio

• Deep Learning, I. Goodfellow, Y. Bengio, A. Courville

• All the references above!

49

Page 50: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1

Agile Autonomy:

High-Speed Flight with On-board Sensing and Computing

Antonio Loquercio

Code, datasets, videos, and publications, slides: https://antonilo.github.io/

[email protected]

@antoniloq @antonilo

Page 51: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

2

The drone market is valued $130 billions today

Inspection Agriculture

Transport Search and Rescue

Davide Scaramuzza

https://www.pwc.pl/pl/pdf/clarity-from-above-pwc.pdf

Page 52: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

3

How are current commercial drones controlled?

➢ By a human pilot▪ requires line of sight or video link

▪ requires a lot of training

➢ By the autopilot: autonomous navigation▪ GPS: doesn’t work in GPS denied or degraded environments

▪ Lidar (e.g., Exyn): expensive, heavy, power hungry

▪ Cameras (e.g., Parrot, DJI, Skydio): cheap, lightweight, passive (i.e., low power)

Page 53: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

4

Last 10-years Progress on Autonomous Vision-based Flight

2020• Skydio (2018-2020), • DJI (2018-2020), • NASA Mars Helicopter (2020)

1st products in the market or sent to another planet ☺

2010EU SFLY Project (2009-2012)

[Bloesch, ICRA 2010]1st onboard goal-oriented

vision-based flight(previous research focused

on reactive navigation) 4

Page 54: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

5

Control Commands

Images

IMU

Reference

Teacher-Student distribution shift

State estimation (VIO) Planning Control

The Traditional Drone Control Architecture

Page 55: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

6

Control Commands

Images

IMU

Reference

Teacher-Student distribution shift

State estimation (VIO) Planning Control

The Traditional Drone Control Architecture

Mature Algorithms, but brittle during high speed due to motion blur.

Require strong assumptions about the environment .

Needs significant tuning, especially at high speed.

Page 56: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

7

Neural NetworkControl

Commands

Images

IMU

Reference

Teacher-Student distribution shiftCan we use Deep Learning to Control Drones?

Yes!

Page 57: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

8

Teacher-Student distribution shiftWhat are the biggest challenges?

1. Data Collection.

2. Input and Output Representation.

3. Guarantee the platform safety during training and testing.

Page 58: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

99

Data Collection: imitate cars and bicycles

➢They are already integrated in the streets of a city

➢Can leverage large publicly available datasets

➢No need of an expert drone pilot

➢Easy to collect negative experiences

DroNet: Learning to Fly by Driving, Loquercio et al., Robotics and Automation Letters 2018, PDF, Video, Code.

Page 59: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1010

A drone that thinks to be a car

DroNet: Learning to Fly by Driving, Loquercio et al., Robotics and Automation Letters 2018, PDF, Video, Code

Page 60: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1111

Generalization: Indoor corridors

➢Extremely different in appearance from outdoor training data: No fine tuning!

DroNet: Learning to Fly by Driving, Loquercio et al., Robotics and Automation Letters 2018, PDF, Video, Code

Page 61: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1212

What is DroNet looking at?

➢Line-like feature are a strong cue of direction

DroNet: Learning to Fly by Driving, Loquercio et al., Robotics and Automation Letters 2018, PDF, Video, Code

Page 62: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1313

Data Collection: Simulation to Reality Transfer

What are the conditions for a neural network to transfer knowledge between domains?

Train Test

• Cannot harm drones at train time.

• Cheap to collect data.

• Can train on ANY trajectory, even the one difficult for drone pilots.

Page 63: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

14

Flightmare Quadrotor Simulator

Song, Naji, Kaufmann, Loquercio, Scaramuzza, Flightmare: A Flexible Quadrotor Simulator, CORL’20, PDF Video Website

Source code: https://uzh-rpg.github.io/flightmare

Page 64: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1515

Sim2Real: Domain Randomization for Drone Racing

Loquercio, et al., Deep Drone Racing with Domain Randomization, IEEE T-RO, 2019. PDF. Video.Best System Paper Award at Conference for Robotics Learning (CORL), 2018.

Training Environments

Page 65: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1616

Loquercio, et al., Deep Drone Racing with Domain Randomization, IEEE T-RO, 2019. PDF. Video.Best System Paper Award at Conference for Robotics Learning (CORL), 2018.

Sim2Real: Domain Randomization for Drone Racing

Page 66: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1717

Sim2Real: Input Abstractions for Drone Acrobatics

Kaufmann*, Loquercio*, Ranftl, Mueller, Koltun, Scaramuzza, Robotics, Science and Systems, 2020, PDF, Video, Code

Source code: https://github.com/uzh-rpg/deep_drone_acrobatics

Deep Drone Acrobatics

17

Page 67: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1818

Acrobatics are Challenging even for Human Pilots

Kaufmann*, Loquercio*, Ranftl, Mueller, Koltun, Scaramuzza, Deep Drone Acrobatics, RSS 2020. PDF, Video, Code

WARNING:This drone is controlled by a human pilot!

Page 68: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

1919

Neural NetworkThrust

Bodyrates

Images

IMU

Reference

Images Feature Tracks

IMU Integration*

Reference

Teacher-Student distribution shiftSim2Real: Learning via Intermediate Representations

Page 69: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

2020

➢ The simulation to reality gap in performance for a finite-horizon controller directly depends on the covariate shift introduced by the simulated and real observation model.

➢ Find an abstraction function to reduce the gap between observation models.

simulated real

Sim2Real: Learning via Intermediate Representations

Page 70: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

2121

IMU

Sampler(100 Hz)

PointNetPointNet

Feature Tracks

PointNet

Temporal Convolutions

128 x 1

64 x 1

64 x 1

• IMU and camera generate observations at different rates.• We propose an asynchronous architecture that can process inputs arriving at different frequencies.

Co

nca

t

Multilayer Perceptron

Command

𝑐𝜔𝑥

𝜔𝑦

𝜔𝑧

Reference Trajectory

Sampler(50 Hz)

1 x 128

•••

T =

81 x 10

•••

T =

8

1 x 10

•••

T =

8

Processing Asynchronous Data

21

Page 71: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

2222

Controlled Experiments in Simulation

We outperform the traditional pipeline based on estimation (VIO) and control (MPC)

Page 72: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

23

Intel RealSense T265for visual-inertial measurements

NVIDIA Jetson TX2for neural network

inference and flight control

Weight: 900gMaximum Thrust: 70NThrust-to-Weight: ca. 7

Page 73: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

2424

Results: Indoor & Outdoor Flight

Kaufmann*, Loquercio*, Ranftl, Mueller, Koltun, Scaramuzza, Deep Drone Acrobatics, RSS 2020, PDF, Video, Code

Page 74: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

25

Takeaways: Learning vs Model-Based Controller

Learning-based controller Traditional controller

The future will be about how to best combine these two worlds

• Robust to imperfect perception

• Easy to program

• Does not generalize to different tasks

• Difficult to interpret what it is doing

• Sensitive to imperfect perception

• Requires extensive expertise

• High generalization: same controller for all maneuvers

• Transparent and easy to interpret

Let’s now see a way to do it!

Page 75: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

2626

𝑦

Low Uncertainty

Sensor Failure

𝑦

High Uncertainty

• A robotic system cannot blindly trust neural network predictions:

• What happens if the current observation is very different from the training ones?

• What if some sensors fail?

Measuring Uncertainty in Deep Neural Networks

Loquercio et al., A General Framework for Uncertainty Estimation in Deep Learning, RA-L 2020, PDF , Video, Code.

Page 76: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

2727

Uncertainty associated with a prediction depends on two factors:

• Data Uncertainty

• Noise inherent in input data

• Independent of the training dataset

• Model Uncertainty

• Uncertainty about network weights

• Decreases with the amount of training data

,

The total uncertainty is the combination of the two

Total Uncertainty

Page 77: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

2828

𝜎2

𝑦

Low Uncertainty

Sensor Failure

𝜎2

𝑦

High Uncertainty

Main contribution

• We propose a novel framework for uncertainty estimation of deep neural network predictions.

• Our framework accounts for the two sources of prediction uncertainty: data uncertainty & model uncertainty.

• Code available at https://github.com/uzh-rpg/deep_uncertainty_estimation

Uncertainty in Deep Learning: The goal

Page 78: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

2929

Whenever the neural network predictions are too uncertain, fall back to a safety modus.

Uncertainty Estimation for Closed-Loop Control

Loquercio et al., A General Framework for Uncertainty Estimation in Deep Learning, RA-L 2020, PDF , Video, Code.

Page 79: Institute of Informatics Institute of Neuroinformatics Deep Learningrpg.ifi.uzh.ch/docs/teaching/2020/dl_tutorial_slides.pdf · 2020. 12. 3. · Fully Convolutional Networks for Semantic

30

➢ Autonomous vision-based agile flight via deep learning is a very promising:

• Car-Driving Datasets or Simulation can be used to train deep navigation systems.

• Transfer knowledge via input and output abstractions.

• Measuring the networks’ uncertainty is necessary to guarantee safety

[email protected]

@antoniloq @antonilo

Conclusions & Takeaways