slam: from frames to events

53
SLAM: from Frames to Events Davide Scaramuzza http://rpg.ifi.uzh.ch 1 Institute of Informatics | Institute of Neuroinformatics

Upload: others

Post on 14-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

SLAM: from Frames to Events

Davide Scaramuzza

http://rpg.ifi.uzh.ch1

Institute of Informatics | Institute of Neuroinformatics

Research Topics

Real-time, Onboard Computer Vision and Control for Autonomous, Agile Drone Flight

2P. Foehn et al., AlphaPilot: Autonomous Drone Racing, RSS 2020, Best System Paper Award. PDF Video

Research Topics

Real-time, Onboard Computer Vision and Control for Autonomous, Agile Drone Flight

3Kaufmann et al., Deep Drone Acrobatics, RSS 2020, Best Paper Award finalist. PDF. Video.

Research Topics

Real-time, Onboard Computer Vision and Control for Autonomous, Agile Drone Flight

4Loquercio et al., Agile Autonomy: Learning High Speed Flight in the Wild, Science Robotics., 2021 PDF. Video. Code & Datasets

5

SLAM: from Frames to Events

Today’s Outline

• A brief history of visual SLAM

• SVO and real-world applications

• Active exposure control

• Event cameras

6

A Brief history of Visual Odometry & SLAM

• Scaramuzza, D., Fraundorfer, F., Visual Odometry: Part I - The First 30 Years and Fundamentals, IEEE Robotics and Automation Magazine, Volume 18, issue 4, 2011. PDF

• Fraundorfer, F., Scaramuzza, D., Visual Odometry: Part II - Matching, Robustness, and Applications, IEEE Robotics and Automation Magazine, Volume 19, issue 1, 2012. PDF

• C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I.D. Reid, J.J. Leonard, Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age, IEEE Transactions on Robotics, Vol. 32, Issue 6, 2016. PDF

• Scaramuzza, Zhang, Visual-Inertial Odometry of Aerial Robots, Encyclopedia of Robotics, Springer, 2019, PDF.

• Huang, Visual-inertial navigation: A concise review, International conference on Robotics and Automation (ICRA), 2019. PDF.

• Gallego, Delbruck, Orchard, Bartolozzi, Taba, Censi, Leutenegger, Davison, Conradt, Daniilidis, Scaramuzza, Event-based Vision: A Survey, IEEE Transactions of Pattern Analysis and Machine Intelligence, 2020. PDF

7

A Brief history of Visual Odometry & SLAM

• 1980: First known VO implementation on a robot by Hans Moraveck PhD thesis (NASA/JPL) for Mars rovers using one sliding camera (sliding stereo)

• 1980 to 2000: The VO research was dominated by NASA/JPL in preparation of the 2004 mission to Mars

• 2000-2004: First real-time monocular real-time VSLAM solutions (e.g., S. Soatto, A. Davison, D. Nister, G. Klein)

• 2004: VSLAM was used on a robot on another planet: Mars rovers Spirit and Opportunity(see seminal paper from NASA/JPL, 2007)

• 2015-today: VSLAM becomes a fundamental tool of several products:vacuum cleaners, scanners, VR/AR, drones, robots, smartphones

• 2021. VSLAM used on the Mars helicopter

8

Recently founded VO & SLAM companies

• AI Incorporated SLAM for autonomy, software

• Artisense SLAM for autonomy, software and hardware

• Augmented Pixels SLAM for mapping, software

• GEO SLAM SLAM for mapping, software

• Indoors SLAM for indoor positioning and mapping, software

• Kudan SLAM for autonomy, software

• Modelai SLAM hardware for drones

• MYNT EYE Manufacturer of camera-IMU sensors, hardware

• NAVVIS SLAM for mapping, software and hardware

• Roboception SLAM for robot arms, software and hardware

• Sevensense SLAM for autonomy, software and hardware

• SLAMCore SLAM for autonomy, software and hardware

• SUIND SLAM for drone autonomy, software and hardware

• VanGogh Imaging SLAM for object tracking and mapping, software

• Wikitude SLAM for AR/VR, software

9

A Short Recap of the last 40 years of VIO

10Feature based (1980-2000)

Accuracy

Feature + Direct (from 2000)

+IMU (from 2007)(10x accuracy)

+Machine Learning

(from 2012)

+Event Cameras

(from 2014)

Robustness(adverse environment conditions, HDR, motion blur, low

texture, dynamic environments)

Efficiency (speed, memory, and CPU load)

We need more datasets to evaluate the performance of SLAM

11

Accuracy

Efficiency (speed, memory, and CPU load)

Robustness(adverse environment conditions,

HDR, motion blur, low texture)

• TartanAir Dataset 2021• BlackBird 2018• UZH-FPV dataset 2018• DSEC 2021• MVSEC 2018• Event Camera 2017• …

• HILTI-SLAM dataset 2021• ETH 3D Dataset 2021• TUM VI Benchmark 2021• Devon Island 2013• TUM-RGBD 2012• KITTI 2012• EuRoC 2016• …

Realistic simulators:• AirSim 2017• Flightmare• FlightGoggles 2019• ESIM 2018• ……

• SLAM Bench 3, 2019

Algorithms are tunes to overfit datasets! We need a Common Task Framework!

HILTI SLAM Dataset & Challenge

12Helmberger et al., The Hilti SLAM Challenge Dataset, Arxiv Preprint, 2021, PDF. Dataset

• 2 LiDARs + 5 standard cameras + 3 IMU• Goal: Benchmark accuracy of structure and motion

HILTI SLAM Challenge – Leader Board

Helmberger et al., The Hilti SLAM Challenge Dataset, Arxiv Preprint, 2021, PDF. Dataset

Use predominantly LiDAR

Use predominantly vision

14

UZH-FPV Drone Racing Dataset & Challenge• Goal: benchmarking VIO & VSLAM algorithms at high speed, where motion blur and high dynamic range are detrimental

• Recorded with a drone flown by a professional pilot up to over 70km/h

• Contains over 30 sequences with images, events, IMU, and ground truth from a robotic total station: https://fpv.ifi.uzh.ch/

• VIO leader board: https://fpv.ifi.uzh.ch/?sourcenova-comp-post=2019-2020-uzh-fpv-temporary-leader-board

Delmerico et al. "Are We Ready for Autonomous Drone Racing? The UZH-FPV Drone Racing Dataset“, ICRA’19. PDF. Video. Datasets.

UZH-FPV Challenge

Sliding-window optimization a la OKVIS or VINS-Mono

Filter-based (MSCKF)

Delmerico et al. "Are We Ready for Autonomous Drone Racing? The UZH-FPV Drone Racing Dataset“, ICRA’19. PDF. Video. Datasets.

No event cameras have been used yet!

Today’s Outline

• A brief history of visual SLAM

• SVO and real-world applications

• Active exposure control

• Event cameras

16

SVO• Key needs: low latency, low memory, high speed

• Combines indirect + direct methods • Direct (minimizes photometric error)

• Used for frame-to-frame motion estimation

• Corners and edgelets

• Jointly optimizes poses & structure (sliding window)

• Indirect (minimizes reprojection error)

• Frame-to-Keyframe pose refinement

• Mapping• Probabilistic depth estimation (heavy tail Gaussian distribution)

• Faster than real-time (up to 400Hz): 400 fps on i7 laptops and 100 fps on smartphone PCs (Odroid (ARM), NVIDIA Jetsons

17Forster, Zhang, Gassner, Werlberger, Scaramuzza, SVO: Semi Direct Visual Odometry for Monocular and Multi-Camera Systems,

IEEE Transactions on Robotics (T-RO), 2017. PDF, code, videos.]

Edgelet Corner

Source code of SVO Pro: https://github.com/uzh-rpg/rpg_svo_pro_open

SVO

18Forster, Zhang, Gassner, Werlberger, Scaramuzza, SVO: Semi Direct Visual Odometry for Monocular and Multi-Camera Systems,

IEEE Transactions on Robotics (T-RO), 2017. PDF, code, videos.]

Edgelet Corner

Probabilistic Depth Estimation

Source code of SVO Pro: https://github.com/uzh-rpg/rpg_svo_pro_open

• Key needs: low latency, low memory, high speed

• Combines indirect + direct methods • Direct (minimizes photometric error)

• Used for frame-to-frame motion estimation

• Corners and edgelets

• Jointly optimizes poses & structure (sliding window)

• Indirect (minimizes reprojection error)

• Frame-to-Keyframe pose refinement

• Mapping• Probabilistic depth estimation (heavy tail Gaussian distribution)

• Faster than real-time (up to 400Hz): 400 fps on i7 laptops and 100 fps on smartphone PCs (Odroid (ARM), NVIDIA Jetsons

Processing time of SVO vs. ORB-SLAM, LSD-SLAM, DSO

19Forster, Zhang, Gassner, Werlberger, Scaramuzza, SVO: Semi Direct Visual Odometry for Monocular and Multi-Camera Systems,

IEEE Transactions on Robotics (T-RO), 2017. PDF, code, videos.]

Processing times in milliseconds

SVO front end is over 10x faster than state of the art systems and 4x more efficient (runs on half a core instead of 2 cores CPU).

This makes it appealing for real-time applications on embedded PCs (drones, smartphones)

SVO Pro (2021) – Just released - does full SLAM!

Includes:

• Supports monocular, stereo systems as well as omnidirectional models (fisheye and catadioptric)

• Visual-inertial sliding window optimization backend (modified from OKVIS)

• Loop closure via DBOW2

• Global Bundle Adjustment or Pose-Graph optimization via iSAM2 in real time (at frame rate)

20Forster, Zhang, Gassner, Werlberger, Scaramuzza, SVO: Semi Direct Visual Odometry for Monocular and Multi-Camera Systems,

IEEE Transactions on Robotics (T-RO), 2017. PDF, code, videos.]

SVO Pro contains a full SLAM system running in real time

Source code of SVO Pro: https://github.com/uzh-rpg/rpg_svo_pro_open

21More here: http://rpg.ifi.uzh.ch/svo2.html

Throw-and-go (2015)(inspired many products, like DJI Tello drone)

Autonomous quadrotor navigation in dynamic scenes (down-looking camera)(running on Odroid U3 board (ARM Cortex A9 at 90fps)

20 m/s obstacle free autonomous quadrotor flight at DARPA FLA (2015)

Virtual Reality with SVO running on an iPhone 6(with company Dacuda at CES 2017)

Startup: “Zurich-Eye” – Today: Facebook-Oculus Zurich

• Vision-based Localization and Mapping systems for mobile robots

• Born in Sep. 2015, became Facebook Zurich in Sep. 2016. Today >200 employees

22

Startup: “Zurich-Eye” – Today: Facebook-Oculus Zurich

• Vision-based Localization and Mapping systems for mobile robots

• Born in Sep. 2015, became Facebook Zurich in Sep. 2016. Today >200 employees

• In 2018, Zurich-Eye launched Oculus Quest (2 million units shipped so far)

23“From the lab to the living room”: The story behind Facebook’s Oculus Insight technology from Zurich-Eye to Oculus Quest:

https://tech.fb.com/the-story-behind-oculus-insight-technology/

SVO and its derivatives are used today in many of products…

• DJI drones

• Magic Leap AR headsets

• Oculus VR headsets

• Huawei phones

• Nikon cameras

• …

24

Takeaway: Partner with industry to understand the key problems

• Industry provides use cases

• They have very stringent requirements:• Low latency

• Low energy (e.g., AR, VR, always-on devices): see NAVION or PULP chips

• Robustness to HDR, blur, dynamic environments, harsh environment conditions

• Accuracy: e.g., construction monitoring requires maps with <5mm absolute error

25

Today’s Outline

• A brief history of visual SLAM

• SVO and real-world applications

• Active exposure control

• Event cameras

26

HDR scenes are challenging for SLAM

• Cameras have limited dynamic range

• Built-in auto-exposure is optimized for image quality, not for SLAM!

27

Idea: Actively adjust the exposure time

Active Camera Exposure Control

28Zhang, Forster, Scaramuzza, Active Exposure Control for Robust Visual Odometry in HDR Environments, ICRA’17. PDF. Video.

Standard Built-in Auto-Exposure Our Active Exposure Control

Active Camera Exposure Control

29

Standard Built-in Auto-Exposure Our Active Exposure Control

Zhang, Forster, Scaramuzza, Active Exposure Control for Robust Visual Odometry in HDR Environments, ICRA’17. PDF. Video.

Takeaway: make your algorithm scene aware!

Cameras have many parameters that can be adaptively tuned or actively controlled to achieve the best performance

• Scene aware exposure-time control [Zhang, ICRA’17]

• Scene aware motion-blur & rolling shutter compensation [Meilland, ICCV’13] [Liu,. ICCV’21]

• More generally, we need scene aware, continuous self-calibration & parameter control

30

Today’s Outline

• A brief history of visual SLAM

• SVO and real-world applications

• Active exposure control

• Event cameras

31

Open Challenges in Computer Vision

The past 60 years of research have been devoted to frame-based cameras but they are not good enough!

32

Dynamic RangeLatency & Motion blur

Event cameras do not suffer from these problems!

What is an event camera?

• Novel sensor that measures only motion in the scene

• Key advantages:• Low-latency (~ 1 μs)

• No motion blur

• Ultra-low power (mean: 1mW vs 1W)

• High dynamic range (140 dB instead of 60 dB)

33

VGA event camera from Prophesee

Video from here

Traditional vision algorithms cannot be directly used because

asynchronous pixels eventcameraoutput:

Opportunities

• Low latency: AR/VR, automotive (<10ms)

• Low energy: AR/VR, always-on devices (see Synsense)

• HDR & No motion blur

34

Who sells event cameras and how much are they?

• Prophesee & SONY:• ATIS sensor: events, IMU, absolute intensity at the event pixel• Resolution: 1M pixels• Cost: ~5,000 USD

• Inivation & Samsung• DAVIS sensor: frames, events, IMU. • Resolution: VGA (640x480 pixels)• Cost: ~5,000 USD

• CelePixel Technology & Omnivision:• Celex One: events, IMU, absolute intensity at the event pixel• Resolution: 1M pixels• Cost: ~1,000 USD

• Cost to sink to <5$ when killer application found(recall first ToF camera (>10,000 USD) today <50 USD)

35

Generative Event Model

• Consider the intensity at a single pixel.

• An event is triggered when the log intensity change passes a threshold 𝐶:

36

log 𝐼 𝒙, 𝑡 − log 𝐼 𝒙, 𝑡 − Δ𝑡 = ±𝐶

log 𝐼(𝒙, 𝑡)

𝑂𝑁

𝑂𝐹𝐹 𝑂𝐹𝐹 𝑂𝐹𝐹

𝑂𝑁 𝑂𝑁

𝑂𝐹𝐹𝑂𝐹𝐹 𝑂𝐹𝐹

Notice that events are generated asynchronously

𝑂𝑁 𝑂𝑁

𝐶 = Contrast threshold

Do events carry the same visual information as normal cameras?

37

Events Munda, IJCV’18 Scheerlinck, ACCV’18

From the event-generation model, we can reconstruct images up to an unknown intensity value

Results are far from perfect mainly due to contrast threshold being not constant(depends on scene content).

Can we learn video from events end to end?

Can we learn to reconstruct video from events?

38

Events Reconstructed video from events

Rebecq et al., “High Speed and High Dynamic Range Video with an Event Camera”, T-PAMI’19. PDF Video Code

The video reconstruction is now very accurate because the network learns an implicit noise model

Learned from Simulation only – One-Shot

• Recurrent neural network based on Unet

• Trained in simulation only, deployed on a real event camera without fine tuning

• We randomize the contrast sensitivity to reduce sim-to-real gap

• Generalizes to real and different event cameras without fine tuning

39

Source code & Datasets: https://github.com/uzh-rpg/rpg_e2vid

Rebecq et al., “High Speed and High Dynamic Range Video with an Event Camera”, T-PAMI’19. PDF Video Code

Reconstructed video inherits all advantages of event cameras: e.g., high temporal resolution

40

Source code & Datasets: https://github.com/uzh-rpg/rpg_e2vid

Bullet: 1300 Km/h

Rebecq et al., “High Speed and High Dynamic Range Video with an Event Camera”, T-PAMI’19. PDF Video Code

Huawei P20 phone camera Our reconstruction from events at over 5,000 fps

Reconstructed video inherits all advantages of event cameras: e.g., high dynamic range

41

Source code & Datasets: https://github.com/uzh-rpg/rpg_e2vid

Huawei P20 phone cameraOur reconstruction from events

Rebecq et al., “High Speed and High Dynamic Range Video with an Event Camera”, T-PAMI’19. PDF Video Code

Raw events

What happens if we feed reconstructed video to a state-of-the-art SLAM algorithm?

42Rebecq et al., “High Speed and High Dynamic Range Video with an Event Camera”, T-PAMI’19. PDF Video Code

The SLAM inherits all the advantages of event cameras: no motion blur, HDR, low-latency!

The Key Challenge

• The fact that we can reconstruct high quality video means that event cameras carry the same visual information as standard cameras

• So it must be possible to perform all vision tasks of standard cameras

• But we want to build efficient and low energy algorithms that compute the output without passing through intermediate image reconstruction

43

Image reconstruction

CV algorithm Output

Application 1: Low-Latency & Low-Energy Tracking

44

• [1] Gallego et al., Event-based 6-DOF Camera Tracking from Photometric Depth Maps, T-PAMI’18. PDF. Video.• [2] Mueggler et al., Continuous-Time Visual-Inertial Odometry for Event Cameras, TRO’18. PDF• [3] Rosinol et al., Ultimate SLAM?, RAL’18 Best Paper Award finalist PDF. Video. IEEE Spectrum.• [3] Gehrig et al., EKLT: Asynchronous, Photometric Feature Tracking using Events and Frames, IJCV 2019. PDF, YouTube, Evaluation Code, Tracking Code

Application 2: “Ultimate SLAM”

Goal: combining events, images, and IMU for robustness to HDR and high speed scenarios

45

Back-End

State-of-the-art

Non-linear-optimization-based VIO

Rosinol-Vidal, Rebecq, Horstschaefer, Scaramuzza, Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High Speed Scenarios, IEEE Robotics and Automation Letters (RAL), 2018 – PDF. Video. Best Paper Award Honorable Mention

Front End:Feature tracking from Events and Frames

Application 2: “Ultimate SLAM”

85% accuracy gain over standard VIO in HDR and high speed scenarios

46Rosinol-Vidal, Rebecq, Horstschaefer, Scaramuzza, Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High Speed

Scenarios, IEEE Robotics and Automation Letters (RAL), 2018 – PDF. Video. Best Paper Award Honorable Mention

Standard camera Event camera

47

Sun, Cioffi, de Visser, Scaramuzza, Autonomous Quadrotor Flight despite Rotor Failure with Onboard Vision Sensors: Frames vs. Events, IEEE RAL’2021. PDF. Video. Code. 1st place winner of the NASA TechBrief Award: Create the Future Contest

• Quadrotors subject to full rotor failure require accurate position estimates to avoid crashing• SOTA systems used external position tracking systems (e.g., GPS, Vicon, UWB)• We achieve this with only onboard cameras. With event cameras, we can make it work in very low light!

Application of Ultimate SLAM: Autonomous Flight despite Rotor Failure

Application 3: Slow Motion Video

• We can combine an event camera with an HD RG camera

• We use events to upsample low-framerate video by over 50 times with only 1/40th of the memory footprint!

48Tulyakov et al., TimeLens: Event-based Video Frame Interpolation, CVPR’21. PDF. Video. Code.

Code & Datasets: http://rpg.ifi.uzh.ch/timelens

Application 3: Slow Motion Video

• We can combine an event camera with an HD RG camera

• We use events to upsample low-framerate video by over 50 times with only 1/40th of the memory footprint!

49Tulyakov et al., TimeLens: Event-based Video Frame Interpolation, CVPR’21. PDF. Video. Code.

Code & Datasets: http://rpg.ifi.uzh.ch/timelens

Application 3: Slomo Video

• We can combine an event camera with an HD RG camera

• We use events to upsample low-framerate video by over 50 times with only 1/40th of the memory footprint!

50Tulyakov et al., TimeLens: Event-based Video Frame Interpolation, CVPR’21. PDF. Video. Code.

Code & Datasets: http://rpg.ifi.uzh.ch/timelens

Application 4: Event-Guided Depth Sensing

51Muglikar et al., Event Guided Depth Sensing, 3DV’21. PDF

• Problem: Standard depth sensors (ToF, LiDAR, Structured Light) sample the depth uniformly and at a fixed scan rate, thus oversampling redundant static information → large power consumption and high latency

• Idea: use event camera to guide depth measurement process: scan with higher spatial density areas generating events, and with lower density the remaining areas

• Finding: since moving edges correspond to less than 10% of the scene on average, event-guided depth sensing could lead to almost 90% less power consumption by the illumination source

Conclusion

• Visual Inertial SLAM theory is well established

• Biggest challenges today are reliability and robustness to:• HDR, low light, Adverse environment conditions, motion blur, low-texture, dynamic environments

• Active control of camera parameters, like exposure time, can greatly benefit

• Machine learning exploits context & provides robustness. • Best way to use is to combine it with geometric approaches.

• Event cameras are complementary to standard cameras and provide:• Robustness to high speed motion and HDR scenes

• Allow low-latency and low-energy which is key for AR/VR and always-on devices

• Current SLAM datasets are saturated: new datasets & challenges are needed (Common Task Framework)!

52

Thanks!

Code, datasets, videos, and publications, slides: http://rpg.ifi.uzh.ch/

I am hiring PhD students and Postdocs in AI

@davsca1 @davidescaramuzzaailabRPG