˘ ˇ ˘ ˆ˙˝ - khronos group...- 3ds max, adobe photoshop cs3, blender, daz|studio, feeling...

11
Towards Better Generalization: Joint Depth-Pose Learning without PoseNet Wang Zhao Shaohui Liu Yezhi Shu Yong-Jin Liu * Department of Computer Science and Technology, Tsinghua University, Beijing, China [email protected], [email protected], [email protected], [email protected] Abstract In this work, we tackle the essential problem of scale in- consistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples, which makes the learning problem harder, resulting in de- graded performance and limited generalization in indoor environments and long-sequence visual odometry applica- tion. To address this issue, we propose a novel system that explicitly disentangles scale from the network estimation. Instead of relying on PoseNet architecture, our method re- covers relative pose by directly solving fundamental matrix from dense optical flow correspondence and makes use of a two-view triangulation module to recover an up-to-scale 3D structure. Then, we align the scale of the depth pre- diction with the triangulated point cloud and use the trans- formed depth map for depth error computation and dense reprojection check. Our whole system can be jointly trained end-to-end. Extensive experiments show that our system not only reaches state-of-the-art performance on KITTI depth and flow estimation, but also significantly improves the generalization ability of existing self-supervised depth-pose learning methods under a variety of challenging scenarios, and achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset. Furthermore, we present some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability. Code is avail- able at https://github.com/B1ueber2y/TrianFlow. 1. Introduction Reconstructing the underlying 3D scenes from a collec- tion of video frames or multi-view images has been a long- standing fundamental topic named structure-from-motion (SfM), which serves as an essential module to many real- world applications such as autonomous vehicles, robotics, augmented reality, etc. While traditional methods are built * Corresponding author. Figure 1. Visual odometry results on sampled sequence 09 and 10 from KITTI Odometry dataset. We sample the original sequences with large stride (stride=3) to simulate fast camera ego-motion that is unseen during training. Surprisingly, all tested PoseNet- based methods get similar failure on trajectory estimation under this challenging scenario. Our system significantly improves the generalization ability and robustness and still works reasonably well on both sequences. See more discussions in Sec 4.4. on the golden rule of feature correspondence and multi-view geometry, a recent trend of deep learning based methods [42, 15, 66] try to jointly learn the prediction of monocular depth and ego-motion in a self-supervised manner, aiming to make use of the great learning ability of deep networks to learn geometric priors from large amount of training data. The key to those self-supervised learning methods is to build a task consistency for training separated CNN net- works, where depth and pose predictions are jointly con- strained by depth reprojection and image reconstruction error. While achieving fairly good results, most exist- ing methods assume that a consistent scale of CNN-based monocular depth prediction and relative pose estimation can be learned across all input samples, since relative pose estimation inherently has scale ambiguity. Although sev- eral recent proposals manage to mitigate this scale prob- lem [2, 12], this strong hypothesis still makes the learn- ing problem difficult and leads to severely degraded per- formance, especially in long-sequence visual odometry ap- plications and indoor environments, where the changes of relative pose across sequences are significantly remarkable. 9151

Upload: others

Post on 05-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 1

��������� ������������������� ��������������������������������������������������

���������������������� �!�"��������#$%�������������� &�&� �!�"��������#$%�������������� &�&�

"���������'�����������"���������'�����������

Page 2: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 2

“DirectX-like” set of native APIsIncludes mixed media acceleration and OS portability APIs

����������� ����� ��������������������� ����� ����������

Embedded Media Acceleration APIs

Vector 2D Streaming Media Enhanced Audio

Dynamic Media Authoring

Dynamic Media Authoring Standards

Cross platform desktop 3D

Cross-platform graphics authoring/acceleration ecosystem

����

Safety Critical 2D/3D

3D Authoring EffectsFramework

Composition Working GroupHardware acceleration for window systems

2D/3D

All open standards for 3D graphics acceleration and authoring are now being

developed in Khronos. Tremendous opportunity for collaboration and synergy

Page 3: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 3

(#�&��)�*�+&��

,-��������!�������$����.������������!!��������

,������

(#�&����#�#+��&��

Authoring requirements for

embedded devices and platforms

Tools and standards to create and distribute compelling embedded content

����

Page 4: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 4

���������������� ������� ���������������� ��������� ���������������� ��• Gaming

- Doom3 / Quake 4 / Prey, Pacific Fighters, Serious Sam, World of Warcraft (on Apple)

• MCAD- Dassault CATIA, PTC Pro/Engineer

• DCC- Autodesk Maya, Softimage XSI

• Imaging - Medical – GE / Philips / Siemens- Oil and Gas – Landmark / Paradigm

• Scientific Visualization- Many university and research applications

• High-end Video Editing- Adobe After Effects, Discreet Fire / Smoke

OpenGL has been the leading cross platform graphics API for over 10 yearsThe OpenGL Architecture Review Board (ARB) voted to become part of Khronos

in July 2006

Page 5: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 5

����� ��������������� �� ������� ��������������� �� ��• OpenGL and OpenGL ES are now evolving under one IP framework

- Design innovations can be freely shared between the APIs

• OpenGL and OpenGL ES can share same resources and outreach- Common Conformance tests, marketing and web-site, tool chains etc.

• OpenGL and OpenGL ES Working Groups will remain independent- Both groups will be able to make decisions that best serve their own markets

Embedded Markets

Desktop Markets

Architectural design expertise

Market feedback on streamlining functionality

Momentum - hundreds of millions of OpenGL ES devices

Page 6: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 6

!�������"����!�������"����• OpenGL ES has leveraged desktop OpenGL architecture

- OpenGL ES 1.1 streamlined OpenGL 1.5- OpenGL ES 2.0 streamlined OpenGL 2.0 and the GLSL shading language

• Next Generation OpenGL “Mount Evans” will leverage OpenGL ES 2.0- Adding significant new functionality to the streamlined OpenGL ES 2.0 core

• OpenGL ES will benefit from the architectural innovation of “Mount Evans”- Nexgen OpenGL ES – “Halti” will use desktop innovations for performance and functionality

OpenGL 1.5/2.0Architectural Foundation

OpenGL ES 1.1/2.0Functional

Streamlining

“Longs Peak/Mount Evans”Next generation functionality starting with OpenGL ES 2.0 streamlined core

“Halti”Streamlining next

generation functionality for embedded markets

Page 7: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 7

�������!�������������!������

July 2006 - OpenGL 2.1 Released and OpenGL ARB Joins Khronos

OpenGL SDK and Quarterly Newsletter Released

“Longs Peak”Target Release Date

4Q06 1Q07 2Q07 3Q073Q06 4Q07

“Mount Evans”Target Release Date

OpenGL 2.1 - Programmability Enhancements- Backwards compatible with all previous versions of OpenGL - Pixel Buffer Objects for fast copies to/from framebuffer

- sRGB color space textures for color management flexibility- Increased flexibility of shader programming:

Non-square matrix support, arrays as first-class objects etc.

Mount Evans – New Generation Functionality- Builds on Longs Peak

- Geometry shader- Stream out of vertex data to a buffer object

- Texture arrays and buffer objects

Longs Peak – API Rationalization- Easy migration path from OpenGL 2.0

- Introduces new object model- Higher performance

- Easier coding- Easier to implement

Page 8: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 8

�������"���������"���� ������#$� ��%������&�%������#$� ��%������&�%• ARB-developed resources

- API documentation and formal specifications

• Community-developed resources selected by the ARB- Libraries, Tutorials, Tools- gDEBugger - ARB is subsidizing Graphic Remedy for OpenGL developers in academia

• Developer documentation (“man pages”)- Completely up to date with OpenGL 2.1- In Docbook XML format, easy to retarget to many different delivery formats

• OpenGL Pipeline Newsletter- Quarterly, includes status reports and mini-tutorials

• OpenGL.org Message Boards- ARB members and many other smart developers participate

����/001112������2���0���

Page 9: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 9

'���##�(�)�*������#����� ��'���##�(�)�*������#����� ��• COLLADA is a XML database schema for 3D assets

- Can hold geometry, animation, visual effects, physics – everything to do with a scene

• COLLADA transports 3D assets between applications- Enables binding of diverse DCC and 3D processing tools into a production pipeline

• COLLADA can be lossless – never lose information- Retains all information - even multiple versions of the same asset

• COLLADA is an open, archive-grade format that retains meta information- When your DCC tool upgrades, you keep your assets

Tool 1

Tool 2

Tool 3

Tool 4

COLLADA is non-destructive and so supports round-

tripping of tools to enable powerful authoring pipelines

Page 10: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 10

'���##�'��� � �� ��'���##�'��� � �� ��

• Conditioning pipelines take authored assets and:

• 1. Strips out authoring-only information

• 2. Re-sizes to suit the target platform• 3. Compresses and formats binary

data for the target platform

• Different target platforms can use the same asset database with the appropriate conditioning pipeline

Multiple tools create assets and scenes in a COLLADA

Database

Conditioning Pipeline

Conditioning Pipeline

• COLLADA is an interchange format - not a delivery format or a scene graph

Page 11: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 11

'���##���������'���##���������• COLLADA 1.4 supported by all major tools and thousands of users

- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI

• Google Earth v4 imports COLLADA - KML v2.1 imports COLLADA models such as buildings, monuments, and statues- Full support for textures and level of detailing

• Unreal Engine 3 using COLLADA- To enable assets to be imported from any authoring tool

• Adobe Photoshop CS3 imports COLLADA- For texture editing on 3D objects

Page 12: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 12

'���##�� ���������'���##�� ����������� &���&���++��,�-���,�-�

• Adding support for OpenGL ES 2.0 shaders- Enhanced shader authoring

• Richer asset types- Integration of audio, video and vector graphics assets

2H04 1H05 2H05 1H06 2H06

COLLADA Becomes Khronos Working Group

COLLADA Adopted by Google as import format for Google Earth

COLLADA 1.0 released by Sony at SIGGRAPH

COLLADA 1.4 Released

1H07

First COLLADA Textbook

OpenGL ES 2.0 supported

Conformance Tests Released

Page 13: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 13

����������• Shader effects are a new paradigm

- Consist of textures, shaders, geometry passes and control

• DCC tools author shaders using COLLADA FX file format- Built on CgFX – contributed by NVIDIA

• Now need ways to use effects in applications at runtime- Enable effects to be portably deployed on many different devices

Authoring

Deployment

����

Streamline the deployment of shader-rich content by defining file formats and

APIs to enable shaders to be portably deployed and

accelerated

Page 14: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 14

Application

OpenGL 2.0 / OpenGL ES 2.0

glFX Runtime API

�����) ��� �������) ��� ��

- Existing Khronos standards- New glFX standard - Application specific code/data

Application traverses scene data, uses the glFX Runtime API to extract effects information to setup the rendering pipeline

COLLADA FX EffectsTextures, shaders programs, geometry, control and pass information

COLLADAFile format for 3D asset interchange – widely adopted by DCC tools vendors and Google, Adobe, Epic etc.

OptionalConditioning for DeliveryOptionally create data

representations for delivery to target devices

Page 15: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 15

�������&��������������&�������

Page 16: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 16

.�*���������������/������0.�*���������������/������0• It doesn’t!• Khronos is purely a non-profit organization

- Funded by member dues – to cover costs

• Our members make money by selling PRODUCTS enabled by standards- NOT trying to charge for the standard itself

$

Selling an API would generate relatively small

amounts of revenue

An open, royalty free API standard creates much

larger market opportunities

Our members cooperate to create standards –and compete in the marketplace with products

that use Khronos standards

Page 17: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 17

Conformance Tests

Conformance Tests

��������)��� � ��� ����������������)��� � ��� ��������

PromotersPromoters

AdoptersAdopters

RatifiedSpecifications

RatifiedSpecifications

ImplementersImplementers

Conformance Tests and Conformance Test Process.Typically $10K per API fee

Anyone can download specifications and SDKs and implement royalty-free products

Conforming products can use API

trademark and logo

Openly and publicly distributed – free of charge,

royalty free

Board decides strategy – approves working groups, controls budget,

ratifies specifications. $20,000 annual membership dues

SDKsSDKs

Free libraries, utilities, examples

ContributorsContributors

Any company can join Khronos to particpate in any number of working

groups to produce specifications.$6,000 annual membership dues

A Working Group for each API standard –

one company one vote

Page 18: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 18

��������'�����������)��������������'�����������)������

Non-Members

Become Adopter Member ($2,500

annual fee)

Sign Adopters Agreement to access Test Source and

Process(typically $10K

per API)

Port Test Source to

product and generate test

results

Upload test results to

Khronos private web-site for peer review by other

Adopters

Successful Review means

products can use Khronos trademarks

Khronos Contributor or

Promoter Members

2D/3D

Vector 2D

Streaming MediaAlready released

COLLADA, OpenKODE coming soon

Page 19: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 19

• “Foundation Level” APIs- Close to the silicon – fundamental functionality needed on every platform

• Designed by industry experts- The industry leaders in media silicon, platform and software are all Khronos members

• Reduces development and deployment costs- Widespread industry adoption ensures competitive silicon and software supply chain

• Open Standards – not controlled by any single company- Any company can join Khronos to have a voice in how standards evolve

• Royalty-free- Khronos is committed to generating market opportunities for its members and the industry

&��1�����������"��������0&��1�����������"��������0

Page 20: ˘ ˇ ˘ ˆ˙˝ - Khronos Group...- 3ds Max, Adobe Photoshop CS3, Blender, DAZ|Studio, Feeling Viewer, - NVIDIA FX Composer, Google Earth, Houdini, Maya, Sketchup, and XSI † Google

© Copyright Khronos Group, 2007 - Page 20

2���/�3�������'�� ���2���42���/�3�������'�� ���2���4

• All these slides and Khronos membership details at www.khronos.org• Also at the Chinese version of the web-site at www.khronos.cn

• Please consider joining Khronos to help us build market opportunities• You will be very welcome in the Khronos family!